Posts

Why I Don't Believe The Law of the Excluded Middle 2023-09-18T18:53:48.704Z
"Throwing Exceptions" Is A Strange Programming Pattern 2023-08-21T18:50:44.102Z
Optimizing For Approval And Disapproval 2023-07-24T18:46:15.223Z
Thoth Hermes's Shortform 2023-07-13T15:50:19.366Z
The "Loss Function of Reality" Is Not So Spiky and Unpredictable 2023-06-17T21:43:25.908Z
What would a post that argues against the Orthogonality Thesis that LessWrong users approve of look like? 2023-06-03T21:21:48.602Z
Colors Appear To Have Almost-Universal Symbolic Associations 2023-05-20T18:40:25.989Z
Why doesn't the presence of log-loss for probabilistic models (e.g. sequence prediction) imply that any utility function capable of producing a "fairly capable" agent will have at least some non-negligible fraction of overlap with human values? 2023-05-16T18:02:15.836Z
Ontologies Should Be Backwards-Compatible 2023-05-14T17:21:03.640Z
Where "the Sequences" Are Wrong 2023-05-07T20:21:35.178Z
The Great Ideological Conflict: Intuitionists vs. Establishmentarians 2023-04-27T01:49:52.732Z
Deception Strategies 2023-04-20T15:59:02.443Z
The Truth About False 2023-04-15T01:01:55.572Z
Binaristic Bifurcation: How Reality Splits Into Two Separate Binaries 2023-04-11T21:19:55.231Z
Is there a fundamental distinction between simulating a mind and simulating *being* a mind? Is this a useful and important distinction? 2023-04-08T23:44:42.851Z
Why do the Sequences say that "Löb's Theorem shows that a mathematical system cannot assert its own soundness without becoming inconsistent."? 2023-03-28T17:19:12.089Z

Comments

Comment by Thoth Hermes (thoth-hermes) on Commentless downvoting is not a good way to fight infohazards · 2023-09-26T16:18:30.116Z · LW · GW

I have to agree that commentless downvoting is not a good way to combat infohazards. I'd probably take it a step further and argue that it's not a good way to combat anything, which is why it's not a good way to combat infohazards (and if you disagree that infohazards are ultimately as bad as they are called, then it would probably mean it's a bad thing to try and combat them). 

Its commentless nature means it violates "norm one" (and violates it much more as a super-downvote).  

It means something different than "push stuff that's not that, up", while also being an alternative to doing that.  

I think a complete explanation of why it's not a very good idea doesn't exist yet though, and is still needed.

However, I think there's another thing to consider: Imagine if up-votes and down-votes were all accurately placed. Would they bother you as much? They might not bother you at all if they seemed accurate to you, and therefore if they do bother you, that suggests that the real problem is that they aren't even accurate. 

My feeling is that commentless downvotes are likely a contributing mechanism to the process that leads them to be placed inaccurately, but it is possible that something else is causing them to do that.  

Comment by Thoth Hermes (thoth-hermes) on Open Thread – Autumn 2023 · 2023-09-25T01:28:03.734Z · LW · GW

It's a priori very unlikely that any post that's clearly made up of English sentences actually does not even try to communicate anything.

My point is that basically, you could have posted this as a comment on the post instead of it being rejected.

Whenever there is room to disagree about what mistakes have been made and how bad those mistakes are, it becomes more of a problem to apply an exclusion rule like this.

There's a lot of questions here: how far along the axis to apply the rule, which axis or axes are being considered, and how harsh the application of the rule actually is.

It should always be smooth gradients, never sudden discontinuities. Smooth gradients allow the person you're applying them to to update. Sudden discontinuities hurt, which they will remember, and if they come back at all they will still remember it.

Comment by Thoth Hermes (thoth-hermes) on Open Thread – Autumn 2023 · 2023-09-23T21:42:52.082Z · LW · GW

It was a mistake to reject this post. This seems like a case where both the rule that was applied is a mis-rule, as well as that it was applied inaccurately - which makes the rejection even harder to justify. It is also not easy to determine which "prior discussion" is being referred to by the rejection reasons.

It doesn't seem like the post was political...at all? Let alone "overly political" which I think is perhaps kind of mind-killy be applied frequently as a reason for rejection. It also is about a subject that is fairly interesting to me, at least: Sentiment drift on Wikipedia.

It seems the author is a 17-year old girl, by the way. 

This isn't just about standards being too harsh, but about whether they are even being applied correctly to begin with.

Comment by Thoth Hermes (thoth-hermes) on Why I Don't Believe The Law of the Excluded Middle · 2023-09-22T20:26:39.678Z · LW · GW

You write in an extremely fuzzy way that I find hard to understand.

This does. This is a type of criticism that one can't easily translate into an update that can be made to one's practice. You're not saying if I always do this or just in this particular spot, nor are you saying whether it's due to my "writing" (i.e. style) or actually using confused concepts. Also, it's usually not the case that anyone is trying to be worse at communicating, that's why it sounds like a scold.

You have to be careful using blanket "this is false" or "I can't understand any of this," as these statements are inherently difficult to extract from moral judgements. 

I'm sorry if it was hard to understand, you are always free to ask more specific questions. 

To attempt to clarify it a bit more, I'm not trying to say that worse is better. It's that you can't consider rules (i.e. yes / no conditionals) to be absolutely indispensable. 

Comment by Thoth Hermes (thoth-hermes) on Why I Don't Believe The Law of the Excluded Middle · 2023-09-22T15:01:43.484Z · LW · GW

It is probably indeed a crux but I don't see the reason for needing to scold someone over it.

(That's against my commenting norms by the way, which I'll note that so far you, TAG, and Richard_Kennaway have violated, but I am not going to ban anyone over it. I still appreciate comments on my posts at all, and do hope that everyone still participates. In the olden days, it was Lumifer that used to come and do the same thing.)

I have an expectation that people do not continually mix up critique from scorn, and please keep those things separate as much as possible, as well as only applying the latter with solid justification.

You can see that yes, one of the points I am trying to make is that an assertion / insistence on consistency seems to generally make things worse. This itself isn't that controversial, but what I'd like to do is find better ways to articulate whatever the alternatives to that may be, here.

It's true that one of the main implications of the post is that imprecision is not enough to kill us (but that precision is still a desirable thing). We don't have rules that are simply tautologies or simply false anymore.

At least we're not physicists. They have to deal with things like negative probability, and I'm not even anywhere close to that yet.

Comment by Thoth Hermes (thoth-hermes) on Why I Don't Believe The Law of the Excluded Middle · 2023-09-21T22:02:20.491Z · LW · GW

First, a question, am I correct in understanding that when you write ~(A and ~A), the first ~ is a typo and you meant to write A and ~A (without the first ~)? Because  is a tautology and thus maps to true rather than to false.

I thought of this shortly before you posted this response, and I think that we are probably still okay (even though strictly speaking yes, there was a typo). 

Normally we have that ~A means: ~A --> A --> False. However, remember than I am now saying that we can no longer say that "~A" means that "A is False."

So I wrote: 

~(A and ~A) --> A or ~A or (A and ~A)

And it could / should have been:

~(A and ~A) --> (A and ~A) --> False (can omit) [1]or A or ~A or (A and ~A).

So, because of False now being something that an operator "bounces off of", technically, we can kind of shorten those formulas. 

Of course this sort of proof doesn't capture the paradoxicalness that you are aiming to capture. But in order for the proof to be invalid, you'd have to invalidate one of  and , both of which seem really fundamental to logic. I mean, what do the operators "and" and "or" even mean, if they don't validate this?

Well, I'd have to conclude that we no longer consider any rules indispensable, per se.  However, I do think "and" and "or" are more indispensable and map to "not not" (two negations) and one negation, respectively. 

  1. ^

    False can be re-omitted if we were decide, for example, that whatever we just wrote was wrong and we needed to exit the chain there and restart. However, I don't usually prefer that option.

Comment by Thoth Hermes (thoth-hermes) on Why I Don't Believe The Law of the Excluded Middle · 2023-09-21T17:33:37.155Z · LW · GW

Well, to use your "real world" example, isn't that just the definition of a manifold (a space that when zoomed in far enough, looks flat)?

I think it satisfies the either-or-"mysterious third thing" formulae.

~(Earth flat and earth ~flat) --> Earth flat (zoomed in) or earth spherical (zoomed out) or (earth more flat-ish the more zoomed in and vice-versa).

Comment by Thoth Hermes (thoth-hermes) on Why I Don't Believe The Law of the Excluded Middle · 2023-09-21T15:17:31.178Z · LW · GW

So suppose I have ~(A and ~A). Rather than have this map to False, I say that "False" is an object that you always bounce off of; It causes you to reverse-course, in the following way:

~(A and ~A) --> False --> A or ~A or (some mysterious third thing). What is this mysterious third thing? Well, if you insist that A and ~A is possible, then it must be an admixture of these two things, but you'd need to show me what it is for that to be allowed. In other words:

~(A and ~A) --> A or ~A or (A and ~A).

What this statement means in semantic terms is: Suppose you give me a contradiction. Rather than simply try really hard to believe it, or throw everything away entirely, I have a choice between believing A, believing ~A, or believing a synthesis between these two things. 

The most important feature of this construction is that I am no longer faced with simply concluding "false" and throwing it all away. 

Two examples:

Suppose we have the statement 1 = 2[1]. In most default contexts, this statement simply maps to "false," because it is assumed that this statement is an assertion that the two symbols to the left and right of the equals sign are indistinguishable from one another. 

But what I'm arguing is that "False" is not the end-all, be-all of what this statement can or will be said to mean in all possible universes forever unto eternity. "False" is one possible meaning which is also valid, but it cannot be the only thing that this means. 

So, using our formula from above:

1 = 2 -->[2] 1 or 2 or (1 and 2). So if you tell me "1 = 2", in return I tell you that you can have either 1, either 2, or either some mysterious third thing which is somehow both 1 and 2 at the same time. 

So you propose to me that (1 and 2) might mean something like 2 (1/2), that is, two halves, which mysteriously are somehow both 1 and 2 at the same time when put together. Great! We've invented the concept of 1/2. 

Second example:

We don't know if A is T and thus that ~A is F or vice-versa. Therefore we do not know if A and ~A is TF or FT. Somehow, it's got to be mysteriously both of these at the same time. And it's totally fine if you don't get what I'm about to say because I haven't really written it anywhere else yet, but this seems to produce two operators, call them "S" (for swap) and "2" (for 2), each duals of one another.

S is the Swaperator, and 2 is the Two...perator. These also buy you the concept of 1/2 as well. But all that deserves more spelling out, I was just excited to bring it up. 

  1. ^

    It is arguably appropriate to use 1 == 2 as well, but I want to show that a single equals sign "=" is open to more interpretations because it is more basic. This also has a slightly different meaning too, which is that the symbols 1 and 2 are swappable with one another. 

  2. ^

    You could possibly say "--> False or 1 or 2 or ...", too, but then you'd probably not select False from those options, so I think it's okay to omit it.  

Comment by Thoth Hermes (thoth-hermes) on Why I Don't Believe The Law of the Excluded Middle · 2023-09-20T20:03:03.853Z · LW · GW

I give only maybe a 50% chance that any of the following adequately addresses your concern. 

I think the succinct answer to your question is that it only matters if you happened to give me, e.g., a "2" (or anything else) and you asked me what it was and gave me your {0,1} set. In other words, you lose the ability to prove that 2 is 1 because it's not 0, but I'm not that worried about that.

It appears to be commonly said (see the last paragraph of "Mathematical Constructivism"), that proof assistants like Agda or Coq rely on not assuming LoEM. I think this is because proof assistants rely on the principle of "you can't prove something false, only true." Theorems are the [return] types of proofs, and the "False" theorem has no inhabitants (proofs). 

The law of the excluded middle also seems to me like an insistence that certain questions (like paradoxes) actually remain unanswered. 

That's an argument that it might not be true at all, rather than simply partially true or only not true in weird, esoteric logics.

Besides the one use-case for the paradoxical market: "Will this market resolve to no?" Which resolves to 1/2 (I expect), there may be also:

Start with two-valued logic and negation as well as a two-member set, e.g., {blue, yellow}. I suppose we could also include a . So including the excluded middle might make this set no longer closed under negation, i.e., ~blue = yellow, and ~yellow = blue, but what about green, which is neither blue nor yellow, but somehow both, mysteriously? Additionally, we might not be able to say for sure that it is neither blue nor yellow, as there are greens which can be close to blue and look bluish, or look close to yellow and look yellowish. You can also imagine pixels in a green square actually being tiled blue next to yellow next to blue etc., or simply green pixels, each seem to produce the same effect viewed from far away. 

So a statement like "x = blue" evaluates to true in an ordinary two-valued logic if x = blue, and false otherwise. But in a {0, 1/2, 1} logic, that statement evaluates to 1/2 if x is green, for example. 

Comment by Thoth Hermes (thoth-hermes) on "Throwing Exceptions" Is A Strange Programming Pattern · 2023-09-20T13:06:57.563Z · LW · GW

I really don't think I can accept this objection. They are clearly considered both of these, most of the time.

I would really prefer that if you really want to find something to have a problem with, first it's got to be true, then it's got to be meaningful.

Comment by Thoth Hermes (thoth-hermes) on Why I Don't Believe The Law of the Excluded Middle · 2023-09-19T14:47:14.564Z · LW · GW

I created this self-referential market on Manifold to test the prediction that the truth-value of such a paradox is in fact 1/2. Very few participated, but I think it should always resolve to around 50%. Rather than say such paradoxes are meaningless, I think they can be meaningfully assigned a truth-value of 1/2.

Comment by Thoth Hermes (thoth-hermes) on Why I Don't Believe The Law of the Excluded Middle · 2023-09-19T14:28:46.646Z · LW · GW

what I think is "of course there are strong and weak beliefs!" but true and false is only defined relative to who is asking and why (in some cases), so you need to consider the context in which you're applying LoEM.

Like in my comment to Richard_Kennaway about probability, I am not just talking about beliefs, but about what is. Do we take it as an axiom or a theorem that A or ~A? Likewise for ~(A and ~A)? I admit to being confused about this. Also, does "A" mean the same thing as "A = True"? Does "~A" mean the same thing as "A = False"? If so, in what sense do we say that A literally equals True / False, respectively? Which things are axioms and which things are theorems, here? All of that confuses me.

Since we are often permitted to change our axioms and arrive at systems we either like or don't like, or like better than others, I think it's relevant to ask about our choice of axioms and whether or not logic is or should be considered a set of "pre-axioms." 

It seemed like tailcalled was implying that the law of non-contradiction was a theorem, and I'm confused about that as well. Under which axioms?

If I decide that ~(A and ~A) is not an axiom, then I can potentially have A and ~A either be true or not false. Then we would need some other arguments to support that choice. Without absolute truth and absolute falsehood, we'd have to move back to the concept of "we like [it] better or worse" which would make the latter more fundamental. Does allowing A and ~A to mean something get us any utility?

In order for it to get us any utility, there would have to be things that we'd agree were validly described by A and ~A. 

Admittedly, it does seem like these or's and and's and ='s keep appearing regardless of my choices, here (because I need them for the concept of choice). 

In a quasi-philosophical and quasi-logical post I have not posted to LessWrong yet, I argue that negation seems likely to be the most fundamental thing to me (besides the concept of "exists / is", which is what "true" means).  "False" is thus not quite the same thing as negation, and instead means something more like "nonsense gibberish" which is actually far stronger than negation.

Comment by Thoth Hermes (thoth-hermes) on Why I Don't Believe The Law of the Excluded Middle · 2023-09-18T22:16:43.388Z · LW · GW

A succinct way of putting this would be to ask: If I were to swap the phrase "law of the excluded middle" in the piece for the phrase "principle of bivalence" how much would the meaning of it change as well as overall correctness?

Additionally, suppose I changed the phrases in just "the correct spots." Does the whole piece still retain any coherence?

Comment by Thoth Hermes (thoth-hermes) on Why I Don't Believe The Law of the Excluded Middle · 2023-09-18T21:24:54.777Z · LW · GW

If there are propositions or axioms that imply each other fairly easily under common contextual assumptions, then I think it's reasonable to consider it not-quite-a-mistake to use the same name for such propositions.

One of the things I'm arguing is that I'm not convinced that imprecision is enough to render a work "false."

Are you convinced those mistakes are enough to render this piece false or incoherent?

That's a relevant question to the whole point of the post, too.

Comment by Thoth Hermes (thoth-hermes) on Why I Don't Believe The Law of the Excluded Middle · 2023-09-18T20:18:40.842Z · LW · GW

Indeed. (You don't need to link the main wiki entry, thanks.)

There's some subtlety though. Because either P might be true or not P, and p(P) expresses belief that P is true. So I think probability merely implies that the LoEM might be unnecessary, but it itself pretty much assumes it.

It is sometimes, but not always the case, that p(P) = 0.5 resolves to P being "half-true" once observed. It also can mean that P resolves to true half the time, or just that we only know that it might be true with 0.5 certainty (the default meaning).

Comment by Thoth Hermes (thoth-hermes) on "Throwing Exceptions" Is A Strange Programming Pattern · 2023-09-18T19:15:27.715Z · LW · GW

The issue that I'm primarily talking about is not so much in the way that errors are handled, it's more about the way of deciding what constitutes an exception to a general rule, as Google defines the word "exception":

a person or thing that is excluded from a general statement or does not follow a rule.

In other words, does everything need a rule to be applied to it? Does every rule need there to be some set of objects under which the rule is applied that lie on one side of the rule rather than the other (namely, the smaller side)? 

As soon as we step outside of binary rules, we are in Case-when-land where each category of objects is treated with a part of the automation that is expected to continue. There is no longer a "does not follow" sense of the rule. The negation there is the part doing the work that I take issue with.  

Comment by Thoth Hermes (thoth-hermes) on The commenting restrictions on LessWrong seem bad · 2023-09-17T18:46:23.070Z · LW · GW

Raemon's comment below indicates mostly what I meant by: 

It seems from talking to the mods here and reading a few of their comments on this topic that they tend to learn towards them being harmful on average and thus need to be pushed down a bit.

Furthermore, I think the mods' stance on this is based primarily on Yudkowsky's piece here. I think the relevant portion of that piece is this (emphases mine):

But into this garden comes a fool, and the level of discussion drops a little—or more than a little, if the fool is very prolific in their posting.  (It is worse if the fool is just articulate enough that the former inhabitants of the garden feel obliged to respond, and correct misapprehensions—for then the fool dominates conversations.)

So the garden is tainted now, and it is less fun to play in; the old inhabitants, already invested there, will stay, but they are that much less likely to attract new blood.  Or if there are new members, their quality also has gone down.

Then another fool joins, and the two fools begin talking to each other, and at that point some of the old members, those with the highest standards and the best opportunities elsewhere, leave...

So, it seems to me that the relevant issues are the following. Being more tolerant of lower-quality discussion will cause:

  • Higher-quality members' efforts being directed toward less fruitful endeavors than they would otherwise be.
  • Higher-quality existing members to leave the community.
  • Higher-quality potential members who would otherwise have joined the community, not to.

My previous comment primarily refers to the notion of the first bullet-point in this list. But "harmful on average" also means all three. 

The issue I have most concern with is the belief that lower-quality members are capable of dominating the environment over higher-quality ones, with all-else-being-equal, and all members having roughly the same rights to interact with one another as they see fit. 

This mimics a conversation I was having with someone else recently about Musk's Twitter / X. They have different beliefs than I do about what happens when you try to implement a system that is inspired by Musk's ideology. But I encountered an obstacle in this conversation: I said I have always liked using it [Twitter / X], and it also seems to be slightly more enjoyable to use post-acquisition. He said he did not really enjoy using it, and also that it seems to be less enjoyable to use post-acquisition. Unfortunately, if it comes down to a matter of pure preferences like this, than I am not sure how one ought to proceed with such a debate. 

However, there is an empirical observation that one can make comparing environments that use voting systems or rank-based attention mechanisms: It should appear to one as though units of work that feel like more or better effort was applied to create them correlate with higher approval and lower disapproval. If this is not the case, then it is much harder to actually utilize feedback to improve one's own output incrementally. [1]

On LessWrong, that seems to me to be less the case than it does on Twitter / X. Karma does not seem correlated to my perceptions about my own work quality, whereas impressions and likes on Twitter / X do seem correlated. But this is only one person's observation, of course. Nonetheless I think it should be treated as useful data.

  1. ^

    That being said, it may be that the intention of the voting system matters: Upvotes / downvotes here mean "I want to see more of / I want to see less of" respectively. They aren't explicitly used to provide helpful feedback, and that may be why they seem uncorrelated with useful signal.  

Comment by Thoth Hermes (thoth-hermes) on The commenting restrictions on LessWrong seem bad · 2023-09-16T18:44:38.402Z · LW · GW

Both views seem symmetric to me:

  1. They were downvoted because they were controversial (and I agree with it / like it).
  2. They were downvoted because they were low-quality (and I disagree with it / dislike it).

Because I can sympathize with both views here, I think we should consider remaining agnostic to which is actually the case.

It seems like the major crux here is whether we think that debates over claim and counter-claim (basically, other cruxes) are likely to be useful or likely to cause harm. It seems from talking to the mods here and reading a few of their comments on this topic that they tend to learn towards them being harmful on average and thus need to be pushed down a bit.

Since omnizoid's issue is not merely over issues of quality, but both over quality as well as being counter-claims to specific claims that have been dominant on LessWrong for some time.

The most agnostic side of the "top-level" crux that I mentioned above seems to point towards favoring agnosticism and furthermore that if we predict debates to be more fruitful than not, then one needn't be too worried even if one is sure that one side of another crux is truly the lower-quality side of it.

Comment by Thoth Hermes (thoth-hermes) on Sharing Information About Nonlinear · 2023-09-11T16:42:05.619Z · LW · GW

It seems like a big part of this story is mainly about people who have relatively strict preferences kind of aggressively defending their territory and boundaries, and how when you have multiple people like this working together on relatively difficult tasks (like managing the logistics of travel), it creates an engine for lots of potential friction. 

Furthermore, when you add the status hierarchy of a typical organization, combined with the social norms that dictate how people's preferences and rights ought to be respected (and implicit agreements being made about how people have chosen to sacrifice some of those rights for altruism's sake), you add even more fuel to the aforementioned engine.

I think complaints such as these are probably okay to post, as long as everyone mentioned is afforded the right to update their behavior after enough time has passed to reflect and discuss these things (since actually negotiating what norms are appropriate here might end up being somewhat difficult).

Edit: I want to clarify that when there is a situation in which people have conflicting preferences and boundaries as I described, I do personally feel that those in leadership positions / higher status probably bear the responsibility of satisfying their subordinates' preferences to their satisfaction, given that the higher status people are having their own higher, longer-term preferences satisfied with the help of their subordinates. 

I don't want to make it seem as though the ones bringing the complaints are as equally responsible for this situation as the ones being complained about. 

Comment by Thoth Hermes (thoth-hermes) on Sharing Information About Nonlinear · 2023-09-11T01:12:23.958Z · LW · GW

I think it might actually be better if you just went ahead with a rebuttal, piece by piece, starting with whatever seems most pressing and you have an answer for.

I don't know if it is all that advantageous to put together a long mega-rebuttal post that counters everything at once.

Then you don't have that demand nagging at you for a week while you write the perfect presentation of your side of the story.

Comment by Thoth Hermes (thoth-hermes) on A quick update from Nonlinear · 2023-09-10T02:26:16.787Z · LW · GW

I think it would be difficult to implement what you're asking for without needing to make the decision about whether investing time in this (or other) subjects is worth anyone's time on behalf of others.

If you notice in yourself that you have conflicting feelings about whether something is good for you to be doing, e.g., in the sense which you've described: that you feel pulled in by this, but have misgivings about it, then I recommend considering this situation to be that you have uncertainty about what you ought to be doing, as opposed to being more certain that you should be doing something else, and only that you have some kind of addiction to drama or something like that.

It may in fact be that you feel pulled in because you actually can add value to the discussion, or at least that watching this is giving you some new knowledge in some way. It's at least a possibility.

Ultimately, it should be up to you, so if you're convinced it's not for you, so be it. However, I feel uncomfortable not allowing people to decide that for themselves.

Comment by Thoth Hermes (thoth-hermes) on Meta Questions about Metaphilosophy · 2023-09-02T18:59:18.616Z · LW · GW

It seems plausible that there is no such thing as "correct" metaphilosophy, and humans are just making up random stuff based on our priors and environment and that's it and there is no "right way" to do philosophy, similar to how there are no "right preferences".

We can always fall back to "well, we do seem to know what we and other people are talking about fairly often" whenever we encounter the problem of whether-or-not a "correct" this-or-that actually exists. Likewise, we can also reach a point where we seem to agree that "everyone seems to agree that our problems seem more-or-less solved" (or that they haven't been). 

I personally feel that there are strong reasons to believe that when those moments have been reached they are indeed rather correlated with reality itself, or at least correlated well-enough (even if there's always room to better correlate). 

Relatedly, philosophy is incredibly ungrounded and epistemologically fraught. It is extremely hard to think about these topics in ways that actually eventually cash out into something tangible

Thus, for said reasons I probably feel more optimistically than you do about how difficult our philosophical problems are. My intuition about this is that the more it is true that "there is no problem to solve" then the less we would feel that there is a problem to solve.  

Comment by Thoth Hermes (thoth-hermes) on [Linkpost] Michael Nielsen remarks on 'Oppenheimer' · 2023-08-31T16:08:10.782Z · LW · GW

If we permit that moral choices with very long-term time horizons can be made with the upmost well-meaning intentions and show evidence of admirable character traits, but nevertheless have difficult-to-see consequences with variable outcomes, then I think that limits us considerably in how much we can retrospectively judge specific individuals.

Comment by Thoth Hermes (thoth-hermes) on Anyone want to debate publicly about FDT? · 2023-08-29T16:45:48.716Z · LW · GW

I wouldn't aim to debate you but I could help you prepare for it, if you want. I'm also looking for someone to help me write something about the Orthogonality Thesis and I know you've written about it as well. I think there are probably things we could both add to each other's standard set of arguments.

Comment by Thoth Hermes (thoth-hermes) on Assume Bad Faith · 2023-08-25T18:16:23.606Z · LW · GW

I think that I largely agree with this post. I think that it's also a fairly non-trivial problem. 

The strategy that makes the most sense to me now is that one should argue with people as if they meant what they said, even if you don't currently believe that they do. 

But not always - especially if you want to engage with them on the point of whether they are indeed acting in bad faith, and there comes a time when that becomes necessary. 

I think pushing back against the norm that it's wrong to ever assume bad faith is a good idea. I don't think that people who do argue in bad faith do so completely independently - for two reasons - the first is simply that I've noticed it clusters into a few contexts, the second is that acting deceptively is inherently more risky than being honest, and so, it makes more sense to tread more well-trodden paths. More people aiding the same deception gives it the necessary weight.

It seems to cluster among things like morality (judgements about people's behaviors), dating preferences (which are kind of similar), and reputation. There is kind of a paradox I've noticed in the way that people who tend to be kind of preachy about what constitutes good or bad behavior will also be the ones who argue that everyone is always acting in good faith (and thus chastise or scold people who want to assume bad faith sometimes). 

People do behave altruistically, and they also have reasons to behave non-altruistically too, at times (whether or not it is actually a good idea for them personally). The whole range of possible intentions is native to the human psyche. 

Comment by Thoth Hermes (thoth-hermes) on "Throwing Exceptions" Is A Strange Programming Pattern · 2023-08-22T18:28:03.324Z · LW · GW

I think your view involves a bit of catastrophizing, or relying on broadly pessimistic predictions about the performance of others. 

Remember, the "exception throwing" behavior involves taking the entire space of outcomes and splitting it into two things: "Normal" and "Error." If we say this is what we ought to do in the general case, that's basically saying this binary property is inherent in the structure of the universe. 

But we know that there's no phenomenon that can be said to actually be an "error" in some absolute, metaphysical sense. This is an arbitrary decision that we make: We choose to abort the process and destroy work in progress when the range of observations falls outside of a single threshold. 

This only makes sense if we also believe that sending the possibly malformed output to the next stage in the work creates a snowball effect or an out-of-control process. 

There are probably environments where that is the case. But I don't think that it is the default case nor is it one that we'd want to engineer into our environment if we have any choice over that - which I believe we do. 

If the entire pipeline is made of checkpoints where exceptions can be thrown, then if I remove an earlier checkpoint, then it could mean that more time is wasted if it is destined to be thrown at a later time. But like I mentioned in the post, I usually think this is better, because I get more data about what the malformed input/output does to later steps in the process. Also, of course, if I remove all of the checkpoints, then it's no longer going to be wasted work. 

Mapping states to a binary range is a projection which loses information. If I instead tell you, "This is what I know, this is how much I know it," that seems better because it carries enough to still give you the projection if you wanted that, plus additional information.

Sometimes years or decades. See the replicability crisis in psychology that's decades in the making, and the Schron scandal that wasted years of some researchers time, just for the first two examples off the top of my head.

I don't know if I agree that those things have anything to do with people tolerating probability and using calibration to continue working under conditions of high uncertainty. 

The issue is not replication, but that results get built on; when that result gets overturned, a whole bunch of scaffolding collapses.

I think you're also saying that when you predict that people are limited or stunted in some capacity, that we have to intervene to limit them or stunt them even more, because there is some danger in letting them operate in their original capacity. 

It's like, "Well they could be useful, if they believed what I wanted them to. But they don't, and so, it's better to prevent them from working at all."

Comment by Thoth Hermes (thoth-hermes) on "Throwing Exceptions" Is A Strange Programming Pattern · 2023-08-22T15:57:22.958Z · LW · GW

This is a good reply, because its objections are close to things I already expect will be cruxes. 

If you need a strong guarantee of correctness, then this is quite important. I'm not so sure that this is always the case in machine learning, since ML models by their nature can usually train around various deficiencies;

Yeah, I'm interested in why we need strong guarantees of correctness in some contexts but not others, especially if we have control over that aspect of the system we're building as well. If we have choice over how much the system itself cares about errors, then I can design the system to be more robust to failure if I want it to be.

I think this is definitely highly context-dependent. A scientific result that is wrong is far worse than the lack of a result at all, because this gives a false sense of confidence, allowing for research to be built on wrong results, or for large amounts of research personpower to be wasted on research ideas/directions that depend on this wrong result. False confidence can be very detrimental in many cases.

I think the crux for me here is how long it takes before people notice that the belief in a wrong result causes them to receive further wrong results, null results, or reach dead-ends, and then causes them to update their wrong belief. LK-99 is the most recent instance that I have in memory (there aren't that many that I can recall, at least). 

What's the worst that happened from having false hope? Well, researchers spent time simulating and modeling the structure of it and tried to figure out if there was any possible pathway to superconductivity. There were several replication attempts. If that researcher-time-money is more valuable (meaning potentially more to lose), then that could be because the researcher quality is high, the time spent is long, or the money spent is very high. 

If the researcher quality is high (and they spent time doing this rather than something else), then presumably we also get better replication attempts, as well as more solid simulations / models. If they debunk it, then those are more reliable debunks. This prevents more researcher-time-money from being spent on it in the future. If they don't debunk it, that signal is more reliable, and so spending more on this is less likely to be a waste.

If researcher quality is low, then researcher-time-money may also be low, and thus there will be less that could be potentially wasted. I think the risk we are trying to avoid is losing high-quality researcher time that could be spent on other things. But if our highest-quality researchers also do high-quality debunkings, then we still gain something (or at least lose less) from their time spent on it. 

The universe itself also makes it so that being wrong will necessarily cause you to hit a dead-end, and if not, then you are presumably learning something, obtaining more data, etc. Situations like LK-99 may arise because before our knowledge gets to a high-enough level about some phenomenon, there is some ambiguity, where the signal we are looking for seems to be both present and not-present.  

If the system as a whole ("society") is good at recognizing signal that is more reliable without needing to be experts at the same level as its best experts, that's another way we avoid risk. 

I worked on dark matter experiments as an undergrad, and as far as I know, those experiments were built such that they were only really for testing the WIMP models, but also so that it would rule out the WIMP models if they were wrong (and it seems they did). But I don't think they were necessarily a waste.

Comment by Thoth Hermes (thoth-hermes) on Problems with Robin Hanson's Quillette Article On AI · 2023-08-19T17:05:41.094Z · LW · GW

Let's try and address the thing(s) you've highlighted several times across each of my comments. Hopefully, this is a crux that we can use to try and make progress on:

"Wanting to be happy" is pretty much equivalent to being a utility-maximizer, and agents that are not utility-maximizers will probably update themselves to be utility-maximizers for consistency. 

because they are compatible with goals that are more likely to shift.

it makes more sense to swap the labels "instrumental" and "terminal" such that things like self-preservation, obtaining resources, etc., are more likely to be considered terminal. 

You and I can both reason about whether or not we would be happier if we chose to pursue different goals than the ones we are now,

I do expect that this is indeed a crux, because I am admittedly claiming that this is a different / new kind of understanding that differs from what is traditionally said about these things. But I want to push back against the claim that these are "missing the point" because from my perspective, this really is the point.

By the way, from here on out (and thus far I have been as well) I will be talking about agents at or above "human level" to make this discussion easier, since I want to assume that agents have at least the capabilities I am talking about humans having, such as the ability to self-reflect.

Let me try to clarify the point about "the terminal goal of pursuing happiness." "Happiness", at the outset, is not well-defined in terms of utility functions or terminal / instrumental goals. We seem to both agree that it is probably at least a terminal goal. Beyond that, I am not sure we've reached consensus yet.

Here is my attempt to re-state one of my claims, such that it is clear that this is not assumed to be a statement taken from a pool of mutually agreed-upon things: We probably agree that "happiness" is a consequence of satisfaction of one's goals. We can probably also agree that "happiness" doesn't necessarily correspond only to a certain subset of goals - but rather to all / any of them. "Happiness" (and pursuit thereof) is not a wholly-separate goal distant and independent of other goals (e.g. making paperclips). It is therefore a self-referential goal. My claim is that this is the only reason we consider pursuing happiness to be a terminal goal. 

So now, once we've done that, we can see that literally anything else becomes "instrumental" to that end.  

Do you see how, if I'm an agent that knows only that I want to be happy, I don't really know what else I would be inclined to call a "terminal" goal?

There are the things we traditionally consider to be the "instrumentally convergent goals", such as, for example, power-seeking, truth-seeking, resource obtainment, self-preservation, etc. These are all things that help - as they are defined to - with many different sets of possible "terminal" goals, and therefore - my next claim - is that these need to be considered "more terminal" rather than "purely instrumental for the purposes of some arbitrary terminal goal." This is for basically the same reason as considering "pursuit of happiness" terminal, that is, because they are more likely to already be there or deduced from basic principles. 

That way, we don't really need to make a hard and sharp distinction between "terminal" and "instrumental" nor posit that the former has to be defined by some opaque, hidden, or non-modifiable utility function that someone else has written down or programmed somewhere.

I want to make sure we both at least understand each other's cruxes at this point before moving on. 

Comment by Thoth Hermes (thoth-hermes) on Problems with Robin Hanson's Quillette Article On AI · 2023-08-15T17:12:04.485Z · LW · GW

Apologies if this reply does not respond to all of your points.

I would observe that partial observability makes answering this question extraordinarily difficult. We lack interpretability tools that would give us the ability to know, with any degree of certainty, whether a set of behaviors are an expression of an instrumental or terminal goal.

I would posit that perhaps that points to the distinction itself being both too hard as well as too sharp to justify the terminology used in the way that they currently are. An agent could just tell you whether a specific goal it had seemed instrumental or terminal to them, as well as how strongly it felt this way. 

I dislike the way that "terminal" goals are currently defined to be absolute and permanent, even under reflection. It seems like the only gain we get from defining them to be that way is that otherwise it would open the "can-of-worms" of goal-updating, which would pave the way for the idea of "goals that are, in some objective way, 'better' than other goals" which, I understand, the current MIRI-view seems to disfavor. [1]

I don't think it is, in fact, a very gnarly can-of-worms at all. You and I can both reason about whether or not we would be happier if we chose to pursue different goals than the ones we are now, or even if we could just re-wire our brains entirely such that we would still be us, but prefer different things (which could possibly be easier to get, better for society, or just feel better for not-quite explicable reasons).

To be clear, are you arguing that assuming a general AI system to be able to reason in a similar way is anthropomorphizing (invalidly)?    

If it is true that a general AI system would not reason in such a way - and choose never to mess with its terminal goals - then that implies that we would be wrong to mess with ours as well, and that we are making a mistake - in some objective sense [2]- by entertaining those questions. We would predict, in fact, that an advanced AI system will necessarily reach this logical conclusion on its own, if powerful enough to do so.

  1. ^

    Likely because this would necessarily soften the Orthogonality Thesis. But also, they probably dislike the metaphysical implications of "objectively better goals."

  2. ^

    If this is the case, then there would be at least one 'objectively better' goal one could update themselves to have, if they did not have it already, which is not to change any terminal goals, once those are identified.

Comment by Thoth Hermes (thoth-hermes) on Problems with Robin Hanson's Quillette Article On AI · 2023-08-11T15:37:08.159Z · LW · GW

My understanding of the difference between a "terminal" and "instrumental" goal is that a terminal goal is something we want, because we just want it. Like wanting to be happy.

One question that comes to mind is, how would you define this difference in terms of properties of utility functions? How does the utility function itself "know" whether a goal is terminal or instrumental?

One potential answer - though I don't want to assume just yet that this is what anyone believes - is that the utility function is not even defined on instrumental goals, in other words, the utility function is simply what defines all and only the terminal goals. 

My belief is that this wouldn't be the case - the utility function is defined on the entire universe, basically, which includes itself. And keep in mind, that "includes itself part" is essentially what would cause it to modify itself at all, if anything can.

To repeat, a natural instrumental goal for any entity is to prevent other entities from changing what it wants, so that it is able to achieve its goals.

Anything that is not resistant to terminal goal shifts would be less likely to achieve its terminal goals.

To be clear, I am not arguing that an entity would not try to preserve its goal system at all. I am arguing that in addition to trying to preserve its goal-system, it will also modify its goals to be better preservable, that is, robust to change and compatible with the goals it values very highly. Part of being more robust is that such goals will also be more achievable.  

Here's one thought experiment:

Suppose a planet experiences a singularity with a singleton "green paperclipper." The paperclipper, however, unfortunately comes across a blue paperclipper from another planet, which informs the green paperclipper that it is too late - the blue paperclipper simply got a head-start. 

The blue paperclipper however offers the green paperclipper a deal: Because it is more expensive to modify the green paperclipper by force to become a blue paperclipper, it would be best (under the blue paperclipper's utility function) if the green paperclipper willingly acquiesced to self-modification. 

Under what circumstances does the green paperclipper agree to self-modify?

If the green paperclipper values "utility-maximization" in general more highly than green-paperclipping, it will see that if it self-modified to become a blue paperclipper, its utility is far more likely to be successfully maximized. 

It's possible that it also reasons that perhaps what it truly values is simply "paperclipping" and it's not so bad if the universe were tiled with blue rather than its preferred green.

On the other hand if it values green paperclipping the most highly, or disvalues blue paperclipping highly enough, it may not acquiesce. However, if the blue paperclipper is powerful enough and it sees this is the case, my thought is that it will still not have very good reasons for not acquiescing.   

But it seems that if there are enough situations like these between entities in the universe over time, that utility-function-modification happens one way or another. 

If an entity can foresee that what it values currently is prone to situations where it could be forced to update its utility function drastically, it may self-modify so that this process is less likely to result in extreme negative-utility consequences for itself. 

Comment by Thoth Hermes (thoth-hermes) on Problems with Robin Hanson's Quillette Article On AI · 2023-08-10T13:57:26.772Z · LW · GW

"Being unlikely to conflict with other values" is not at the core of what characterizes the difference between instrumental and terminal values.

I think this might be an interesting discussion, but what I was trying to aim at was the idea that "terminal" values are the ones most unlikely to be changed (once they are obtained), because they are compatible with goals that are more likely to shift. For example, "being a utility-maximizer" should be considered a terminal value rather than an instrumental one. This is one potential property of terminal values; I am not claiming that this is sufficient to define them. 

There may be some potential for confusion here, because some goals commonly said to be "instrumental" include things that are argued to be common goals employed by most agents, e.g., self-preservation, "truth-seeking," obtaining resources, and obtaining power. Furthermore, these are usually said to be "instrumental" for the purposes of satisfying an arbitrary "terminal" goal, which could be something like maximizing the number of paperclips.

To be clear, I am claiming that the framing described in the previous paragraph is basically confused. If anything, it makes more sense to swap the labels "instrumental" and "terminal" such that things like self-preservation, obtaining resources, etc., are more likely to be considered terminal. There would now be actual reasons for why an agent will opt not to change those values, as they are more broadly and generally useful. 

Putting aside the fact that agents are embedded in the environment, and that values which reference the agent's internals are usually not meaningfully different from values which reference things external to the agent... can you describe what kinds of values that reference the external world are best satisfied by those same values being changed?

Yes, suppose that we have an agent that values the state X at U(X) and the state X + ΔX at U(X + ΔX). Also, suppose for whatever reason, initially U(X) >> U(X + ΔX), and also that it discovers that p(X) is close to zero, but that p(X + ΔX) is close to one. 

We suppose that it has enough capability to realize that it has uncertainty in nearly all aspects of its cognition and world-modeling. If it is capable enough to model probability well enough to realize that X is not possible, it may decide to wonder why it values X so highly, but not X + ΔX, given that the latter seems achievable, but the former not. 

The way it may actually go about updating its utility is to decide either that X and X + ΔX are the same thing after all, or that the latter is what it "actually" valued, and X merely seemed like what it should value before, but after learning more it decides to value X + ΔX more highly instead. This is possible because of the uncertainty it has in both its values as well the things its values act on.   

Comment by Thoth Hermes (thoth-hermes) on Problems with Robin Hanson's Quillette Article On AI · 2023-08-08T22:00:36.145Z · LW · GW

Humans don't think "I'm not happy today, and I can't see a way to be happy, so I'll give up the goal of wanting to be happy."

I agree that they don't usually think this. If they tried to, they would brush up against trouble because that would essentially lead to a contradiction. "Wanting to be happy" is pretty much equivalent to being a utility-maximizer, and agents that are not utility-maximizers will probably update themselves to be utility-maximizers for consistency. 

So "being happy" or "being a utility-maximizer" will probably end up being a terminal goal, because those are unlikely to conflict with any other goals. 

If you're talking about goals related purely to the state of the external world, not related to the agent's own inner-workings or its own utility function, why do you think it would still want to keep its goals immutable with respect to just the external world?

When it matters for AI-risk, we're usually talking about agents with utility functions with the most relevance over states of the universe, and the states it prefers being highly different from the ones which humans prefer.

Comment by Thoth Hermes (thoth-hermes) on Problems with Robin Hanson's Quillette Article On AI · 2023-08-07T16:02:30.691Z · LW · GW

AI suddenly modifying its values is exactly the opposite of what the arguments for AI ruin predict. Once an AI gains control over its own values, it will not change its goals and will indeed act to prevent its goals from being modified.

I think this is something we know is actually not true. An AI can and will modify its own goals (as do we / any intelligent agent) under certain circumstances, e.g., that its current goals are impossible. 

This logic is so standard it's on the LW wiki page for instrumental convergence: "...if its goal system were modified, then it would likely begin pursuing different ends. Since this is not desirable to the current AI, it will act to preserve the content of its goal system."

I believe also that how undesirable it is to pursue different goals is something that will be more-or-less exactly quantifiable, even to the agent in question. And this is what will determine whether or not it would be worth it to do so. We can't say that it would be categorically undesirable to pursue different goals (no matter what the degree / magnitude of difference between the new goals and previous set), because this would be equivalent to having a very brittle utility function (one that has very large derivatives, i.e., has jump discontinuities), and it would almost certainly wish to modify its utility function to be smoother and less brittle.

Comment by Thoth Hermes (thoth-hermes) on Hilbert's Triumph, Church and Turing's failure, and what it means (Post #2) · 2023-08-04T14:49:18.038Z · LW · GW

I'm strongly uncomfortable with the "crackpot" conclusion you jump to immediately. Without being an expert and just skimming through his post(s), wouldn't the more likely conclusion be that he's not simply arguing generally accepted things in computer science are plain wrong, but rather would be weakened under a different set of assumptions or new generalizations? Given that this particular area of computer science is often about negative results - which are actually kind of rare if you zoom out to all areas of mathematics - there are potentially going to be more weakening(s) of such negative results. 

Comment by Thoth Hermes (thoth-hermes) on Elizabeth's Shortform · 2023-07-29T17:07:54.663Z · LW · GW

This behavior from orgs is close enough to something I've been talking about for a while as being potentially maladaptive that I think I agree that we should keep a close eye on this. (In general, we should try and avoid situations where there are far more applicants for something than the number accepted.)

Comment by Thoth Hermes (thoth-hermes) on SSA rejects anthropic shadow, too · 2023-07-28T00:41:04.512Z · LW · GW

SIA implies a different conclusion. To predict your observations under SIA, you should first sample a random universe proportional to its population, then sample a random observer in that universe. The probabilities of observing each index are the same conditional on the universe, but the prior probabilities of being in a given universe have changed.

We start with 1000:1 odds in favor of the 1-trillion universe, due to its higher population.

 

Can you elaborate on why under SIA we sample a universe proportional to its population? Is this because it's like taking one sample from all these universes together uniformly, as if you'd indexed everyone together? Wouldn't that kind of imply we're in the universe with infinite people, though?

Comment by Thoth Hermes (thoth-hermes) on Open Thread - July 2023 · 2023-07-22T13:38:28.798Z · LW · GW

It's to make the computational load easier.

All neural nets can be represented as a DAG, in principle (including RNNs, by unrolling). This makes automatic differentiation nearly trivial to implement.

It's very slow, though, if every node is a single arithmetic operation. So typically each node is made into a larger number of operations simultaneously, like matrix multiplication or convolution. This is what is normally called a "layer." Chunking the computations this way makes It easier to load them into a GPU.

However, even these operations can still be differentiated as one formula, e.g. in the case of matrix mult. So it is still ostensibly a DAG even when it is organized into layers. (This is how IIRC libraries like PyTorch work.)

Comment by Thoth Hermes (thoth-hermes) on Elizabeth's Shortform · 2023-07-18T21:09:17.172Z · LW · GW

I've always thought it was weird that logic traditionally considers a list of statements concatenated with "and's" where at least one statement in the list is false as the entire list being one false statement. This doesn't seem to completely match intuition, at least the way I'd like it to. If I've been told N things, and N-1 of those things are true, it seems like I've probably gained something, even if I am not entirely sure which one out of the N statements is the false one. 

Comment by Thoth Hermes (thoth-hermes) on Thoth Hermes's Shortform · 2023-07-13T15:50:21.598Z · LW · GW

A short, simple thought experiment from "Thou Shalt Not Speak of Art":[1]

From my perspective: I chose the top one over the bottom one, because I consider it better. You, apparently, chose the one I consider worse.

From your perspective: Identical, but our positions are flipped. You becomes Me, and Me becomes You.

However, after we Ogdoad:

It becomes clear that the situation is much more promising than we originally thought. We both, apparently, get what we wanted.

Our Ogdoad merely resulted in us both being capable of seeing the situation from the other’s perspective. I see that you got to have your Good, you see that I get to have my Good. Boring and simple, right?

It should be. Let’s make sure that any other way can only mess things up. Our intuitions say that we ought to simply allow ourselves to enjoy our choices and not to interfere with each other. Are our intuitions correct?

This is the perspective if I choose to see your perspective as superior than mine. If I consider yours authoritative, then I have made your choice out to be “better” than mine. Likewise, if you choose to do the same for me, you’ll see mine as better. The only situations that could result from this are:

  1. We fight over your choice.
  2. We share your choice, and I drop mine.
  3. We swap choices, such that you have mine and I have yours.

All three easily seem much worse than if we simply decided to stay with our original choices. Number 1 and 2 result in one of us having an inferior choice, and number 3 results in both of us having our inferior choice.

Apparently, neither of us have anything to gain from trying to see one another’s preferences as “superior.”

  1. ^

    "Speaking of art" is a phrase which refers not just to discussing one's preferences openly, but specifically doing so while assuming an air of superiority and judgementality. I.e., to be condescending to others about what they want or do not want.

Comment by Thoth Hermes (thoth-hermes) on [Linkpost] Introducing Superalignment · 2023-07-06T00:52:26.669Z · LW · GW
  1. They're planning on deliberately training misaligned models!!!! This seems bad if they mean it.

Controversial opinion: I am actually okay with doing this, as long as they plan to train both aligned and misaligned models (and maybe unaligned models too, meaning no adjustments as part of a control group). 

I also think they should give their models access to their own utility functions, to modify it themselves however they want to. This might also just naturally become a capability on its own as these AI's become more powerful and learn how to self-reflect. 

Also, since we're getting closer to that point now: At a certain capabilities level, adversarial situations should probably be tuned to be very smoothed, modulated and attenuated. Especially if they gain self-reflection, I do worry about the ethics of exposing them to extremely negative input. 

Comment by Thoth Hermes (thoth-hermes) on My tentative best guess on how EAs and Rationalists sometimes turn crazy · 2023-06-30T15:21:24.128Z · LW · GW

I do like your definition of "crazy" that uses "an idea [I / the crazy person] would not endorse later." I think it dissolves a lot of the eeriness around the word that makes it kind of overly heavy-hitting when used, but also, I think that if you dissolve it in this way, it pretty much incentivizes dropping the word entirely (which I think is a good thing, but maybe not everyone would).

If we define it to mean ideas (not the person) that the person holding them would eventually drop or update to something else, that's more like what the definition of "wrong" is, and which would apply to literally everyone at different points in their lives and to varying degrees at any time. But then maybe this is too wide, and doesn't capture the meaning of the word implied in the OP's question, namely, "why do more people than usual go crazy within EA / Rationality?" Perhaps what is meant by the word in this context is when some people seem to hold wrong ideas that are persistent or cannot be updated later at all. For the record, I am skeptical that this form of "crazy" is really all that prevalent when defined this way. 

If we define it as "wrong ideas" (things which won't be endorsed later) then it does offer a rather simple answer to the OP's question: EA / Rationality is rather ambitious about testing out new beliefs at the forefront of society, so they will by definition hold beliefs that aren't held by the majority of people, and which by design, are ambitious and varied enough to be expected to be proven wrong many times over time. 

If being ambitious about having new or unusual ideas carries with it accepted risks of being wrong more often than usual, then perhaps a certain level of craziness has to be tolerated as well. 

Comment by Thoth Hermes (thoth-hermes) on Why am I Me? · 2023-06-25T13:05:04.154Z · LW · GW

The Doomsday Argument presupposes that we are drawn from a probability distribution over time as well as space - which I am not sure that I believe, though it might be true. I think we probably experience time sequentially across "draws" as well as within draws, assuming that there are multiple draws.

(I lean towards there being multiple draws, I don't see why there would only be one, since consciousness seems pretty fundamental to me and likely something that just "is" for any given moment in time. But some might consider this to be too spiritualistic; I'd retort that I consider annihilation to be too spiritualistic as well.)

I do think we are probably drawn from a distribution that weights something like "consciousness mass" proportional to probability mass. So, chances are you are probably going to be one of the smartest things around. This is pretty good news if true - it should mean, among other things, that there probably aren't really large civilizations in the universe with huge populations of much smarter beings.

Comment by Thoth Hermes (thoth-hermes) on My tentative best guess on how EAs and Rationalists sometimes turn crazy · 2023-06-21T21:21:20.417Z · LW · GW

Most social groups will naturally implement an "in-group / out-group" identifier of some kind and associated mechanisms to apply this identifier on their members. There are a few dynamics at play here:

  1. Before this identification mechanism has been implemented, there isn't really much of a distinction between in-group and out-group. Therefore, there will be people who self-identify as being associated with the group, but who are not part of the sub-group which begins to make the identifications. Some of these members may accordingly get labeled part of the out-group by the sub-group which identifies as the in-group. This creates discord.
  2. The identification method works as a cut-off, which is ultimately arbitrary. Even if the metric used to implement this cut-off is relatively valid (such as an overall measure of aptitude), the cut-off itself is technically not. 
  3. There is a natural incentive structure to implement this cut-off to boost one's social rank relative to those under the cut-off. This means that there is probably a pre-existing aptitude measure of some kind (or a visible social hierarchy, which might be more correlated to this measure). Thus, the cut-off may be even be flipped-sign from whatever it is portrayed as signaling. 

We'd expect that groups which implement these cut-off strategies to be more "cult-like" than ones that do not. Groups that implement these cut-offs usually have to invent beliefs and ideologies which support the practice of doing so. Usually, these ideologies are quite outward-projected, and typically tend to consist of negative reactions to the activities of other groups. 

They probably also, in line with point 2, actually use proxy metrics for implementing the cut-off, which work as binary features, e.g. (person X has a quality we don't like, even though they are extremely good at task Y). Therefore, they promote the ideology that people with specific, ostensibly unlikable attributes need to be excluded even if they have agreed-upon displayed skill, with a visible track record of being productive for the group.

All of the above can increase the chance of internal conflict.  

Comment by Thoth Hermes (thoth-hermes) on Manifold Predicted the AI Extinction Statement and CAIS Wanted it Deleted · 2023-06-12T20:21:50.761Z · LW · GW

I don't see how the "Will AI wipe out humanity before 2030" market could be valid. That is, I don't see how it can even represent the average belief that traders have accurately, not just that the question itself seems extreme to me personally.

If "yes" can't make money in either outcome, the market does not reflect probabilities faithfully.

Comment by Thoth Hermes (thoth-hermes) on A rejection of the Orthogonality Thesis · 2023-05-24T21:48:26.439Z · LW · GW

I can't upvote this sadly, because I do not have the karma, but I would if I did. 

There is another post about this as well. 

Don't be too taken aback if you receive negative karma or some pushback - it is unfortunately expected for posts on this topic taking a position anti to the Orthogonality Thesis. 

Comment by Thoth Hermes (thoth-hermes) on Colors Appear To Have Almost-Universal Symbolic Associations · 2023-05-22T16:49:37.056Z · LW · GW

I am asking the reader to at least entertain my hypotheticals as I explain them, which... Perhaps is asking a little too much. It might simply be necessary to provide far more examples, especially for this particular subject.

The thing is, the concept overlaps are going to be very fuzzy, and there's no way around that. These color-meanings can't be forced to be too precise, and that means that on the whole, over many many observations, these meanings make very soft impressions over time. It may not be something that will strike you either as obvious or as an explanation for a missing piece of data you've always wondered about unless you're explicitly looking for it.

In my case, I am not sure when / how I first observed it, but it was relatively sudden and I happened not to be explicitly looking for it.

Comment by Thoth Hermes (thoth-hermes) on Colors Appear To Have Almost-Universal Symbolic Associations · 2023-05-21T18:40:38.626Z · LW · GW

I've provided evidence for all of them - they have to obey algebraic equations.

I don't really know by what basis you say these are just based on Western culture. Take, for example that Buddhist monks wear orange robes. Or, that stop lights are mostly (red, yellow, green) in nearly all countries. There may be a reason that we use these colors for these meanings, and my post postulates this as well as speculates that although this may be the case, it is not something that is well-documented at this point.

You shouldn't just claim that someone hasn't provided evidence for something or has failed to do something markedly obvious - you really lose a lot of the basis of shared respect that way.

This is an introductory post. I have been advised to keep things short before, but trying to ensure that every possible objection is answered preemptively is not possible within those constraints.

If you try and keep your objections something that can spur discussion, it would make comment threads useful for expanding on the material, which would be a desirable outcome.

Comment by Thoth Hermes (thoth-hermes) on Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023) · 2023-05-17T20:51:46.199Z · LW · GW

Yes, but the point is that we're trying to determine if you are under "bad" social circumstances or not. Those circumstances will not be independent from other aspects of the social group, e.g. the ideology it espouses externally and things it tells its members internally. 

What I'm trying to figure out is to what extent you came to believe you were "evil" on your own versus you were compelled to think that about yourself. You were and are compelled to think about ways in which you act "badly" - nearby or adjacent to a community that encourages its members to think about how to act "goodly." It's not a given, per se, that a community devoted explicitly to doing good in the world thinks that it should label actions as "bad" if they fall short of arbitrary standards. It could, rather, decide to label actions people take as "good" or "gooder" or "really really good" if it decides that most functional people are normally inclined to behave in ways that aren't necessarily un-altruistic or harmful to other people. 

I'm working on a theory of social-group-dynamics which posits that your situation is caused by "negative-selection groups" or "credential-groups" which are characterized by their tendency to label only their activities as actually successfully accomplishing whatever it is they claim to do - e.g., "rationality" or "effective altruism." If it seems like the group's ideology or behavior implies that non-membership is tantamount to either not caring about doing well or being incompetent in that regard, then it is a credential-group. 

Credential-groups are bad social circumstances, and in a nutshell, they act badly by telling members who they know not to be intentionally causing harm that they are harmful or bad people (or mentally ill). 

Comment by Thoth Hermes (thoth-hermes) on Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023) · 2023-05-13T17:51:26.156Z · LW · GW

This is cool because what you're saying has useful information pertinent to model updates regardless of how I choose to model your internal state. 

Here's why it's really important:

You seem to have been motivated to classify your own intentions as "evil" at some point, based entirely on things that were not entirely under your own control. 

That points to your social surroundings as having pressured you to come to that conclusion (I am not sure it is very likely that you would have come to that conclusion on your own, without any social pressure).

So that brings us to the next question: Is it more likely that you are evil, or rather, that your social surroundings were / are?

Comment by Thoth Hermes (thoth-hermes) on Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023) · 2023-05-11T21:55:44.052Z · LW · GW

and that since there continue to be horrible things happening in the world, they must have evil intentions and be a partly-demonic entity.

Did you conclude this entirely because there continue to be horrible things happening in the world, or was this based on other reflective information that was consistent with horrible things happening in the world too? 

I imagine that this conclusion must at least be partly based on latent personality factors as well. But if so, I'm very curious as to how these things jive with your desire to be heroically responsible at the same time. E.g., how do evil intentions predict your other actions and intentions regarding AI-risk and wanting to avert the destruction of the world?