Posts

AI alignment researchers may have a comparative advantage in reducing s-risks 2023-02-15T13:01:50.799Z
Moral Anti-Realism: Introduction & Summary 2022-04-02T14:29:01.751Z
Moral Anti-Epistemology 2015-04-24T03:30:27.972Z
Arguments Against Speciesism 2013-07-28T18:24:58.354Z

Comments

Comment by Lukas_Gloor on Failures in Kindness · 2024-03-27T22:59:44.675Z · LW · GW

I really liked this post! I will probably link to it in the future.

Edit: Just came to my mind that these are things I tend to think of under the heading "considerateness" rather than kindness, but it's something I really appreciate in people either way (and the concepts are definitely linked). 

Comment by Lukas_Gloor on On Lex Fridman’s Second Podcast with Altman · 2024-03-27T21:32:34.455Z · LW · GW

FWIW, one thing I really didn't like about how he came across in the interview is that he seemed to be engaged in framing the narrative one-sidedly in an underhanded way, sneakily rather than out in the open. (Everyone tries to frame the narrative in some way, but it becomes problematic when people don't point out the places where their interpretation differs from others, because then listeners won't easily realize that there are claims that they still need to evaluate and think about rather than just take for granted and something that everyone else already agrees about.) 

He was not highlighting the possibility that the other side's perspective still has validity; instead, he was shrugging that possibility under the carpet. He talked as though (implicitly, not explicitly) it's now officially established or obviously true that the board acted badly (Lex contributed to this by asking easy questions and not pushing back on anything too much). He focused a lot on the support he got during this hard time and people saying good things about him (eulogy while still alive comparison, highlighting that he thinks there's no doubt about his character) and said somewhat condescending things about the former board (about how he thinks they had good intentions, said in that slow voice and thoughtful tone, almost like they had committed a crime) and then emphasized their lack of experience. 

For contrast, here are things he could have said that would have made it easier for listeners to come to the right conclusions (I think anyone who is morally scrupulous about whether they're in the right in situations when many others speak up against them would have highlighted these points a lot more, so the absence of these bits in Altman's interview is telling us something.)

  • Instead of just saying that he believes the former board members came from a place of good intentions, also say if/whether he believes that some of the things they were concerned about weren't totally unreasonable from their perspective. E.g., acknowledge things he did wrong or things that, while not wrong, understandably would lead to misunderstandings.
  • Acknowledge that just because a decision had been made by the review committee, the matter of his character and suitability for OpenAI's charter is not now settled (esp. given that the review maybe had a somewhat limited scope?). He could point out that it's probably rational (or, if he thinks this is not necesarily mandated, at least flag that he'd understand if some people now feel that way) for listeners of the youtube interview to keep an eye on him, while explaining how he intends to prove that the review committee came to the right decision. 
  • He said the board was inexperienced, but he'd say that in any case, whether or not they were onto something. Why is he talking about their lack of experience so much rather than zooming in on their ability to assess someone's character? It could totally be true that the former board was both inexperienced and right about Altman's unsuitability. Pointing out this possibility himself would be a clarifying contribution, but instead, he chose to distract from that entire theme and muddle the waters by making it seem like all that happened was that the board did something stupid out of inexperience, and that's all there was.
  • Acknowledge that it wasn't just an outpouring of support for him; there were also some people who used to occasion to voice critical takes about him (and the Y Combinator thing came to light). 

(Caveat that I didn't actually listen to the full interview and therefore may have missed it if he did more signposting and perspective taking and "acknowledging that for-him-inconvenient hypotheses are now out there and important if true and hard to dismiss entirely for at the very least the people without private info" than I would've thought from skipping through segments of the interview and Zvi's summary.)

In reaction to what I wrote here, maybe it's a defensible stance to go like, "ah, but that's just Altman being good at PR; it's just bad PR for him to give any air of legitimacy to the former board's concerns." 

I concede that, in some cases when someone accuses you of something, they're just playing dirty and your best way to make sure it doesn't stick is by not engaging with low-quality criticism. However, there are also situations where concerns have enough legitimacy that shrugging them under the carpet doesn't help you seem trustworthy. In those cases, I find it extra suspicious when someone shrugs the concerns under the carpet and thereby misses the opportunity to add clarity to the discussion, make themselves more trustworthy, and help people form better views on what's the case.

Maybe that's a high standard, but I'd feel more reassured if the frontier of AI research was steered by someone who could talk about difficult topics and uncertainty around their suitability in a more transparent and illuminating way. 

Comment by Lukas_Gloor on On Lex Fridman’s Second Podcast with Altman · 2024-03-27T21:10:08.017Z · LW · GW

There are realistic beliefs Altman could have about what's good or bad for AI safety that would not allow Zvi to draw that conclusion. For instance: 

  • Maybe Altman thinks it's really bad for companies' momentum to go through CEO transitions (and we know that he believes OpenAI having a lot of momentum is good for safety, since he sees them as both adequately concerned about safety and more concerned about it than competitors).
  • Maybe Altman thinks OpenAI would be unlikely to find another CEO who understands the research landscape well enough while also being good at managing, who is at least as concerned about safety as Altman is.
  • Maybe Altman was sort of willing to "put that into play," in a way, but his motivation to do so wasn't a desire for power, nor a calculated strategic ploy, but more the understandable human tendency to hold a grudge (esp. in the short term) against the people who just rejected and humiliated him, so he understandably didn't feel a lot of motivational pull to want help them look better about the coup they had just attempted for what seemed to him as unfair/bad reasons. (This still makes Altman look suboptimal, but it's a lot different from "Altman prefers power so much that he'd calculatedly put the world at risk for his short-term enjoyment of power.")
  • Maybe the moments were Altman thought things would go sideways were only very brief, and for the most part, when he was taking actions towards further escalation, he was already very confident that he'd win. 

Overall, the point is that it seems maybe a bit reckless/uncharitable to make strong inferences about someone's rankings of priorities just based on one remark they made being in tension with them pushing in one direction rather than the other in a complicated political struggle.

Comment by Lukas_Gloor on Balancing Games · 2024-02-26T14:12:28.393Z · LW · GW

Small edges are why there's so much money gambled in poker. 

It's hard to reach a skill level where you make money 50% of the night, but it's not that hard to reach a point where you're "only" losing 60% of the time. (That's still significantly worse than playing roulette, but compared to chess competitions where hobbyists never win any sort of prize, you've at least got chances.) 

Comment by Lukas_Gloor on I'd also take $7 trillion · 2024-02-20T16:23:43.441Z · LW · GW

You criticize Altman for pushing ahead with dangerous AI tech, but then most of what you'd spend the money on is pushing ahead with tech that isn't directly dangerous. Sure, that's better. But it doesn't solve the issue that we're headed into an out-of-control future. Where's the part where we use money to improve the degree to which thoughtful high-integrity people (or prosocial AI successor agents with those traits) are able to steer where this is all going? 
(Not saying there are easy answers.) 

Comment by Lukas_Gloor on Are most personality disorders really trust disorders? · 2024-02-09T00:02:13.275Z · LW · GW

I mean, personality disorders are all about problems in close interpersonal relationships (or lack of interest in such relationships, in schizoid personality disorder), and trust is always really relevant in such relationships, so I think this could be a helpful lens of looking at things. At the same time, I'd be very surprised if you could derive new helpful treatment approaches from this sort of armchair reasoning (even just at the level of hypothesis generation to be subjected to further testing).

Also, some of these seem a bit strained: 

  • Narcissistic personality disorder seems to be more about superiority and entitlement than expecting others to be trusting. And narcissism is correlated with Machiavellianism, where a feature of that is having a cynical worldview (i.e., thinking people in general aren't trustworthy). If I had to frame narcissism in trust terms, I'd maybe say it's an inability to value or appreciate trust?
  • Histrionic personality disorder has a symptom criterion of "considers relationships to be more intimate than they actually are." I guess maybe you could say "since (by your hypothesis) they expect people to not care, once someone cares, a person with histrionic personality disorder is so surprised that they infer that the relationship must be deeper than it is." A bit strained, but maybe can be made to fit.
  • Borderline: I think there's more of pattern to splitting than randomness (e.g., you rarely have splitting in the early honeymoon stage of a relationship), so maybe something like "fluctuating" would fit better. But also, I'm not sure what fluctuates is always about trust. Sure, sometimes splitting manifests in accusing the partner of cheating out of nowhere, but in other cases, the person may feel really annoyed at the partner in a way that isn't related to trust. (Or it could be related to trust, but going in a different direction: they may resent the partner for trusting them because they have such a low view of themselves that anyone who trusts them must be unworthy.)
  • Dependent: To me the two things you write under it seem to be in tension with each other.

Edit:

Because it takes eight problems currently considered tied up with personal identy and essentially unsolvable [...]

I think treatment success probabilities differ between personality disorders. For some, calling them "currently considered essentially unsolvable" seems wrong.

And not sure how much of OCPD is explained by calling it a persistent form of OCD – they seem very different. You'd expect "persistent" to make something worse, but OCPD tends to be less of an issue for the person who has it (but can be difficult for others around them). Also, some symptoms seem to be non-overlapping, like with OCPD I don't think intrusive thoughts play a role (I might be wrong?), whereas intrusive thoughts are a distinct and telling feature of some presentations of OCD.

Comment by Lukas_Gloor on [Intro to brain-like-AGI safety] 10. The alignment problem · 2024-02-07T16:24:29.964Z · LW · GW

Dilemma:

  • If the Thought Assessors converge to 100% accuracy in predicting the reward that will result from a plan, then a plan to wirehead (hack into the Steering Subsystem and set reward to infinity) would seem very appealing, and the agent would do it.
  • If the Thought Assessors don’t converge to 100% accuracy in predicting the reward that will result from a plan, then that’s the very definition of inner misalignment!

    [...]

    The thought “I will secretly hack into my own Steering Subsystem” is almost certainly not aligned with the designer’s intention. So a credit-assignment update that assigns more positive valence to “I will secretly hack into my own Steering Subsystem” is a bad update. We don’t want it. Does it increase “inner alignment”? I think we have to say “yes it does”, because it leads to better reward predictions! But I don’t care. I still don’t want it. It’s bad bad bad. We need to figure out how to prevent that particular credit-assignment Thought Assessor update from happening.

    [...]

    I think there’s a broader lesson here. I think “outer alignment versus inner alignment” is an excellent starting point for thinking about the alignment problem. But that doesn’t mean we should expect one solution to outer alignment, and a different unrelated solution to inner alignment. Some things—particularly interpretability—cut through both outer and inner layers, creating a direct bridge from the designer’s intentions to the AGI’s goals. We should be eagerly searching for things like that.

Yeah, there definitely seems to be something off about that categorization. I've thought a bit about how this stuff works in humans, particularly in this post of my moral anti-realism sequence. To give some quotes from that:

One of many takeaways I got from reading Kaj Sotala’s multi-agent models of mind sequence (as well as comments by him) is that we can model people as pursuers of deep-seated needs. In particular, we have subsystems (or “subagents”) in our minds devoted to various needs-meeting strategies. The subsystems contribute behavioral strategies and responses to help maneuver us toward states where our brain predicts our needs will be satisfied. We can view many of our beliefs, emotional reactions, and even our self-concept/identity as part of this set of strategies. Like life plans, life goals are “merely” components of people’s needs-meeting machinery.[8]

Still, as far as components of needs-meeting machinery go, life goals are pretty unusual. Having life goals means to care about an objective enough to (do one’s best to) disentangle success on it from the reasons we adopted said objective in the first place. The objective takes on a life of its own, and the two aims (meeting one’s needs vs. progressing toward the objective) come apart. Having a life goal means having a particular kind of mental organization so that “we” – particularly the rational, planning parts of our brain – come to identify with the goal more so than with our human needs.[9]

[...]

There’s a normative component to something as mundane as choosing leisure activities. [E.g., going skiing in the cold, or spending the weekend cozily at home.] In the weekend example, I’m not just trying to assess the answer to empirical questions like “Which activity would contain fewer seconds of suffering/happiness” or “Which activity would provide me with lasting happy memories.” I probably already know the answer to those questions. What’s difficult about deciding is that some of my internal motivations conflict. For example, is it more important to be comfortable, or do I want to lead an active life? When I make up my mind in these dilemma situations, I tend to reframe my options until the decision seems straightforward. I know I’ve found the right decision when there’s no lingering fear that the currently-favored option wouldn’t be mine, no fear that I’m caving to social pressures or acting (too much) out of akrasia, impulsivity or some other perceived weakness of character.[21]

We tend to have a lot of freedom in how we frame our decision options. We use this freedom, this reframing capacity, to become comfortable with the choices we are about to make. In case skiing wins out, then “warm and cozy” becomes “lazy and boring,” and “cold and tired” becomes “an opportunity to train resilience / apply Stoicism.” This reframing ability is a double-edged sword: it enables rationalizing, but it also allows us to stick to our beliefs and values when we’re facing temptations and other difficulties.

[...]

Visualizing the future with one life goal vs. another

Whether a given motivational pull – such as the need for adventure, or (e.g.,) the desire to have children – is a bias or a fundamental value is not set in stone; it depends on our other motivational pulls and the overarching self-concept we’ve formed.

Lastly, we also use “planning mode” to choose between life goals. A life goal is a part of our identity – just like one’s career or lifestyle (but it’s even more serious).

We can frame choosing between life goals as choosing between “My future with life goal A” and “My future with life goal B” (or “My future without a life goal”). (Note how this is relevantly similar to “My future on career path A” and “My future on career path B.”)

[...]

It’s important to note that choosing a life goal doesn’t necessarily mean that we predict ourselves to have the highest life satisfaction (let alone the most increased moment-to-moment well-being) with that life goal in the future. Instead, it means that we feel the most satisfied about the particular decision (to adopt the life goal) in the present, when we commit to the given plan, thinking about our future. Life goals inspired by moral considerations (e.g., altruism inspired by Peter Singer’s drowning child argument) are appealing despite their demandingness – they can provide a sense of purpose and responsibility.

So, it seems like we don't want "perfect inner alignment," at least not if inner alignment is about accurately predicting reward and then forming the plan of doing what gives you most reward. Also, there's a concept of "lock in" or "identifying more with the long-term planning part of your brain than with the underlying needs-meeting machinery." Lock in can be dangerous (if you lock in something that isn't automatically corrigible), but it might also be dangerous not to lock in anything (because this means you don't know what other goals form later on).

Idk, the whole thing seems to me like brewing a potion in Harry Potter, except that you don't have a recipe book and there's luck involved, too. "Outer alignment," a minimally sufficient degree thereof (as in: the agent tends to gets rewards when it takes actions towards the intended goal), increases the likelihood that you get broadly pointed you in the right direction, so the intended goal maybe gets considered among things the internal planner considers reinforcing itself around / orienting itself towards. But then, whether the intended gets picked over other alternatives (instrumental requirements for general intelligence, or alien motivations the AI might initially have), who knows. Like with raising a child, sometimes they turn out the way the parents intend, sometimes not at all. There's probably a science to finding out how outcomes become more likely, but even if we could do that with human children developing into adults with fixed identities, there's then still the question of how to find analogous patterns in (brain-like) AI. Tough job.

Comment by Lukas_Gloor on [Intro to brain-like-AGI safety] 9. Takeaways from neuro 2/2: On AGI motivation · 2024-02-04T14:34:25.497Z · LW · GW

Conditioned Taste Aversion (CTA) is a phenomenon where, if I get nauseous right now, it causes an aversion to whatever tastes I was exposed to a few hours earlier—not a few seconds earlier, not a few days earlier, just a few hours earlier. (I alluded to CTA above, but not its timing aspect.) The evolutionary reason for this is straightforward: a few hours is presumably how long it typically takes for a toxic food to induce nausea.

That explains why my brother no longer likes mushrooms. When we were little, he liked them and we ate mushrooms at a restaurant, then were driven through curvy mountain roads later that day with the family. He got car sick and vomited, and afterwards he had an intense hatred for mushrooms.

Comment by Lukas_Gloor on On Not Requiring Vaccination · 2024-02-02T22:44:16.022Z · LW · GW

Is that sort of configuration even biologically possible (or realistic)? I have no deep immunology understanding, but I think bad reactions to vaccines have little to nothing to do with whether you're up-to-date on previous vaccines. So far, I'm not sure we're good at predicting who reacts with more severe side effects than average (and if we did, it's not like it's easy to tweak the vaccine, except for tradeoff-y things like lowering the vaccination dose). 

Comment by Lukas_Gloor on Effective Aspersions: How the Nonlinear Investigation Went Wrong · 2023-12-22T11:46:20.546Z · LW · GW

My point is that I have no evidence that he ended up reading most of the relevant posts in their entirety. I don't think people who read all the posts in their entirety should just go ahead and unilaterally dox discussion participants, but I feel like people who have only read parts of it (or only secondhand sources) should do it even less

Also, at the time, I interpreted Roko's "request for a summary" more as a way for him to sneer at people. His "summary" had a lot of loaded terms and subjective judgments in it. Maybe this is a style thing, but I find that people should only (at most) write summaries like that if they're already well-informed. (E.g., Zvi's writing style can be like that, and I find it fine because he's usually really well-informed. But if I see him make a half-assed take on something he doesn't seem to be well-informed on, I'd downvote.) 

Comment by Lukas_Gloor on Effective Aspersions: How the Nonlinear Investigation Went Wrong · 2023-12-22T04:24:59.031Z · LW · GW

See my comment here

Kat and Emerson were well-known in the community and they were accused of something that would cause future harm to EA community members as well. By contrast, Chloe isn't particularly likely to make future false allegations even based on Nonlinear's portrayal (I would say). It's different for Alice, since Nonlinear claim she has a pattern. (But with Alice, we'd at least want someone to talk to Nonlinear in private and verify how reliable they seem about negative info they have about Alice, before simply taking their word for it based on an ominous list of redacted names and redacted specifics of accusations.)

Theoretically Ben could have titled his post, "Sharing Information About [Pseudonymous EA Organization]", and requested the mods enforce anonymity of both parties, right?

That would miss the point, rendering the post almost useless. The whole point is to prevent future harm. 

but not for Roko to unilaterally reveal the names of Alice and Chloe?

Alice and Chloe had Ben, who is a trusted community member, look into their claims. I'd say Ben is at least somewhat "on the hook" for the reliability of the anonymous claims.

By contrast, Roko posted a 100 word summary of the Nonlinear incident that got some large number of net downvotes, so he seems to be particularly poorly informed about what even happened.

Comment by Lukas_Gloor on Pseudonymity and Accusations · 2023-12-22T04:07:35.692Z · LW · GW

Some conditions for when I think it's appropriate for an anonymous source to make a critical post about a named someone on the forum:

  • Is the accused a public person or do they run an organization in the EA or rationality ecosystem?
  • Or: Is the type of harm the person is accused of something that the community benefits from knowing?
  • Did someone who is non-anonymous and trusted in the community talk to the anonymous accuser and verify claims and (to some degree*) stake their reputation for them?

*I think there should be a role of "investigative reporter:" someone verifies that the anonymous person is not obviously unreliable. I don't think the investigative reporter is 100% on the hook for anything that will turn out to be false or misleading, but they are on the hook for things like doing a poor job at verifying claims or making sure there aren't any red flags about a person. 

(It's possible for anonymous voices to make claims without the help of an "investigative reporter;" however, in that case, I think the appropriate community reaction should be to give little-to-no credence to such accusations. After all, they could be made by someone who already has had their reputation justifiably tarnished.)

On de-anonymizing someone (and preventing an unfair first-mover advantage): 

  • In situations where the accused parties are famous and have lots of influence, we can view anonymity protection as evening the playing field rather than conferring an unfair advantage. (After all, famous and influential people already have a lot of advantages on their side – think of Sam Altman in the conflict with the OpenAI board.)
  • If some whistleblower displays a pattern/history of making false accusations, that implies potential for future harm, so it seems potentially appropriate to warn others about them (but you'd still want to be cautious, take your time to evaluate evidence carefully, and not fall prey to a smear campaign by the accused parties – see DARVO).
  • If there's no pattern/history of false accusations, but the claims by a whistleblower turn out to be misleading in more ways than one would normally expect in the heat of things (but not egregiously so), then the situation is going to be unsatisfying, but personally I'd err on the side of protecting anonymity. (I think this case is strongest the more the accused parties are more powerful/influential than the accusers.) I'd definitely protect anonymity if the accusations continue to seem plausible but are impossible to prove/there remains lots of uncertainty.
  • I think de-anonymization, if it makes sense under some circumstances, should only be done after careful investigation, and never "in the heat of the movement." In conflicts that are fought publicly, it's very common for different sides to gain momentum temporarily but then lose it again, depending on who had the last word. 
Comment by Lukas_Gloor on Effective Aspersions: How the Nonlinear Investigation Went Wrong · 2023-12-19T18:35:37.788Z · LW · GW

Very thoughtful post. I liked that you delved into this out of interest even though you aren't particularly involved in this community, but then instead of just treating it as fun but unproductive gossip, you used your interest to make a high-value contribution! 

It changed my mind in some places (I had a favorable reaction to the initial post by Ben; also, I still appreciate what Ben tried to do). 

I will comment on two points that I didn't like, but I'm not sure to what degree this changes your recommended takeaways (more on this below).

They [Kat and Emerson] made a major unforced tactical error in taking so long to respond and another in not writing in the right sort of measured, precise tone that would have allowed them to defuse many criticisms.

I don't like that this sounds like this is only (or mostly) about tone.

I updated that the lawsuit threat was indeed more about tone than I initially thought. I initially thought that any threat of a lawsuit is strong evidence that someone is a bad actor. I now think it's sometimes okay to mention the last resort of lawsuits if you think you're about to be defamed.  

At the same time, I'd say it was hard for Lightcone to come away with that interpretation when Emerson used phrases like 'maximum damages permitted by law' (a phrasing optimized for intimidation). Emerson did so in the context where one of the things he was accused of was unusually hostile negotiation and intimidation tactics! So, given the context and "tone" of the lawsuit threat, I feel like it made a lot of sense for Lightcone to see their worst concerns about Emerson "confirmation-boosted" when he made that lawsuit threat.

In any case, and more to my point about tone vs other things, I want to speak about the newer update by Nonlinear that came three months after the original post by Ben. Criticizing tone there is like saying "they lack expert skills at defusing tensions; not ideal, but also let's not be pedantic." It makes it sound like all they need to become great bosses is a bit of tactfulness training. However, I think there are more fundamental things to improve on, and these things lend a bunch of credibility to why someone might have a bad time working with them. (Also, they had three months to write that post, and it's really quite optimized for presentation in several ways, so it's not like we should apply low standards for this post.) I criticized some aspects of their post here and here. In short, I feel like they reacted by (1) conceding little to no things they could have done differently and (2), going on the attack with outlier-y black-and-white framings against not just Alice, but also Chloe, in a way that I think is probably more unfair/misleading/uncharitable about Chloe than what Chloe said about them. (I say "probably" because I didn't spend a lot of time re-reading Ben's original post and trying to separate which claims were made by Alice vs Chloe, doing the same about Nonlinear's reply, and filtering out whether they're ascribing statements to Chloe with their quotes-that-aren't-quotes that she didn't actually say.) I think that's a big deal because their reaction pattern-matches to how someone would react if they did indeed have a "malefactor" pattern of frequently causing interpersonal harm. Just like it's not okay to make misleading statements about others solely because you struggled with negative emotions in their presence, it's also (equally) not okay to make misleading statements solely because someone is accusing you of being a bad boss or leader. It can be okay to see red in the heat of battle, but it's an unfortunate dynamic because it blurs the line between people who are merely angry and hurt and people who are character-wise incapable of reacting appropriately to appropriate criticism. (This also goes into the topic of "adversarial epistemology" – if you think the existence of bad actors is a sufficient problem, you want to create social pressure for good-but-misguided actors to get their shit together and stop acting in a way/pattern that lends cover to bad actors.)

Eliezer recently re-tweeted this dismissive statement about DARVO. I think this misses the point. Sure, if the person who accuses you is a malicious liar or deluded to a point where it has massively destructive effects and is a pattern, then, yeah, you're forced to fight back. So, point taken: sometimes the person who appears like the victim initially isn't actually the victim. However, other times the truth is at least somewhat towards the middle, i.e., the person accusing you of something may have some points. In that case, you can address what happened without character-assassinating them in return, especially if you feel like you had a lot of responsibility in them having had a bad time. Defending Alice is not the hill I want to die on (although I'm not saying I completely trust Nonlinear's picture of her), but I really don't like the turn things took towards Chloe. I feel like it's messed up that several commenters (at one point my comment here had 9 votes and -5 overall karma, and high disagreement votes) came away with the impression that it might be appropriate to issue a community-wide warning about Chloe as someone with a pattern of being destructive (and de-anonymizing her, which would further send the signal that the community considers her a toxic person). I find that a really scary outcome for whistleblower norms in the community. Note that this isn't because I think it's never appropriate to de-anonymize someone.

Here are the list of values that are important to me about this whole affair and context:

  • I want whistleblower-type stuff to come to light because I think the damage bad leaders can do is often very large
  • I want investigations to be fair. In many cases, this means giving accused parties time to respond
  • I understand that there’s a phenotype of personality where someone has a habit of bad-talking others through false/misleading/distorted claims, and I think investigations (and analysis) should be aware of that

(FWIW, I assume that most people who vehemently disagree with me about some of the things I say in this comment and elsewhere would still endorse these above values.)

So, again, I'm not saying I find this a scary outcome because I have a "always believe the victim" mentality. (Your post fortunately doesn't strawman others like that, but there were comments on Twitter and facebook that pushed this point, which I thought was uncalled for.) 

Instead, consider for a moment the world where I'm right that:

  • Chloe isn't a large outlier in any relevant way of personality, except perhaps she was significantly below average at standing up for her interests/voicing her boundaries (for which it might even be possible that it was selected for in the Nonlinear hiring process)

This is what I find most plausible based on a number of data points. In that world, I think something about the swing of the social pendulum went wrong when the result of Chloe sharing her concerns makes things worse for her. (I'm not saying this is currently the case – I'm saying it would be the case if we fully bought into Nonlinear's framing or the people who make the most negative comments about both Chloe and Alice, without flagging that many people familiar with the issue thought that Alice was a less reliable narrator than Chloe, etc.)

Of course, I focused a lot on a person who is currently anonymized. Fair to say that this is unfair given that Nonlinear have their reputation at stake all out in the open. Like I said elsewhere, it's not like I think they deserved the full force of this.

These are tough tradeoffs to make. Unfortunately, we need some sort of policy to react to people who might be bad leaders. Among all the criticisms about Ben's specific procedure, I don't want this part to be de-emphasized.

The community mishandled this so badly and so comprehensively that inasmuch as Nonlinear made mistakes in their treatment of Chloe or Alice, for the purposes of the EA/LW community, the procedural defects have destroyed the case.

I'm curious what you mean by the clause "for the purposes of the EA/LW community." I don't want to put words into your mouth, but I'd be sympathetic to a claim that goes as follows. From a purely procedural perspective about what a fair process should look like for a community to decide that a particular group should be cut out from the community's talent pipeline (or whatever harsh measure people want to consider),  it would be unfair to draw this sort of conclusion against Nonlinear based on the too many flaws in the process used. If that's what you're saying, I'm sympathetic to that at the very least in the sense of "seems like a defensible view to me." (And maybe also overall – but I find it hard to think about this stuff and I'm a bit tired of the affair.) 

At the same time, I feel like, as a private individual, it's okay to come away with confident beliefs (one way or the other) from this whole thing. It takes a higher bar of evidence (and assured fairness of procedure) to decide "the community should act as though x is established consensus" than it takes to yourself believe x.

Comment by Lukas_Gloor on Nonlinear’s Evidence: Debunking False and Misleading Claims · 2023-12-17T15:31:38.513Z · LW · GW

An organization gets applications from all kinds of people at once, whereas an individual can only ever work at one org. It's easier to discreetly contact most of the most relevant parties about some individual than it is to do the same with an organization.

I also think it's fair to hold orgs that recruit within the EA or rationalist communities to slightly higher standards because they benefit directly from association with these communities.

That said, I agree with habryka (and others) that 

I think if the accusations are very thoroughly falsified and shown to be highly deceptive in their presentation, I can also imagine some scenarios where it might make sense to stop anonymizing, though I think the bar for that does seem pretty high.

Comment by Lukas_Gloor on Dialogue on the Claim: "OpenAI's Firing of Sam Altman (And Shortly-Subsequent Events) On Net Reduced Existential Risk From AGI" · 2023-11-23T03:04:22.122Z · LW · GW

a) A lot of your points are specifically about Altman and the board, whereas many of my points started that way but then went into the abstract/hypothetical/philosophical. At least, that's how I meant it – I should have made this more clear. I was assuming, for the sake of the argument, that we're speaking of a situation where the person in the board's position found out that someone else is deceptive to their very core, with no redeeming principles they adhere to. So, basically what you're describing in your point "I" with the lizardpeople. I focused on that type of discussion because I felt like you were attacking my principles, and I care about defending my specific framework of integrity. (I've commented elsewhere on things that I think the board should or shouldn't have done, so I also care about that, but I probably already spent too many comments on speculations about the board's actions.) 
Specifically about the actual situation with Altman, you say: 
"I'm saying that you should honor the agreement you've made to wield your power well and not cruelly or destructively. It seems to me that it has likely been wielded very aggressively and in a way where I cannot tell that it was done justly."
I very much agree with that, fwiw. I think it's very possible that the board did not act with integrity here. I'm just saying that I can totally see circumstances where they did act with integrity. The crux for me is "what did they believe about Altman and how confident were they in their take, and did they make an effort to factor in moral injunctions against using their power in a self-serving way, etc?" 

b) You make it seem like I'm saying that it's okay to move against people (and e.g. oust them) without justifying yourself later or giving them the chance to reply at some point later when they're in a less threatening position. I think we're on the same page about this: I don't believe that it would be okay to do these things. I wasn't saying that you don't have to stand answer to what you did. I was just saying that it can, under some circumstances, be okay to act first and then explain yourself to others later and establish yourself as still being trustworthy.

c) About your first point (point "I"), I disagree. I think you're too deontological here. Numbers do count. Being unfair to someone who you think is a bad actor but turns out they aren't has a victim count of one. Letting a bad actor take over the startup/community/world you care about has a victim count of way more than one. I also think it can be absolutely shocking how high this can go (in terms of various types of harms caused by the bad tail of bad actors) depending on the situation. E.g., think of Epstein or dictators. On top of that, there are indirect bad effects that don't quite fit the name "victim count" but that still weigh heavily, such as distorted epistemics or destruction of a high-trust environment when it gets invaded by bad actors. Concretely, I feel like when you talk about the importance of the variable "respect towards Altman in the context of how much notice to give him," I'm mostly thinking, sure, it would be nice to be friendly and respectful, but that's a small issue compared to considerations like "if the board is correct, how much could he mobilize opposition against them if he had a longer notice period?" So, I thought three months notice would be inappropriate given what's asymmetrically at stake on both sides of the equation. (It might change once we factor in optics and how it'll be easier for Altman to mobilize opposition if he can say he was treated unfairly – for some reason, this always works wonders. DARVO is like dark magic. Sure, it sucks for Altman to lose a 100 billion company that they built. But an out-of-control CEO recklessly building the most dangerous tech in the world sucks more for way more people in expectation.) In the abstract, I think it would be an unfair and inappropriate sense of what matters if a single person who is accused of being a bad actor gets more respect than their many victims would suffer in expectation. And I'm annoyed that it feels like you took the moral high ground here by making it seem like my positions are immoral. But maybe you meant the "shame on yourself" for just one isolated sentence, and not my stance as a whole. I'd find that more reasonable. In any case, I understand now that you probably feel bothered for an analogous reason, namely that I made a remark about how it's naive to be highly charitable or cooperative under circumstances where I think it's no longer appropriate. I want to flag that nothing you wrote in your newest reply seems naive to me, even though I do find it misguided. (The only thing that I thought was maybe naive was the point about three months notice – though I get why you made it and I generally really appreciate examples like that about concrete things the board could have done. I just think it would backfire when someone would use these months to make moves against you.)

d) The "shame on yourself" referred to something where you perceived me to be tribal, but I don't really get what that was about. You write "and (c) kind of saying your tribe is the only one with good people in it." This is not at all what I was kind of saying. I was saying my tribe is the only one with people who are "naive in such-and-such specific way" in it, and yeah, that was unfair towards EAs, but then it's not tribal (I self-identify as EA), and I feel like it's okay to use hyperbole this way sometimes to point at something that I perceive to be a bit of a problem in my tribe. In any case, it's weirdly distorting things when you then accuse me of something that only makes sense if you import your frame on what I said. I didn't think of this as being a virtue, so I wasn't claiming that other communities don't also have good people. 

e) Your point "III" reminds me of this essay by Eliezer titled "Meta-Honesty: Firming Up Honesty Around Its Edge-Cases." Just like Eliezer in that essay explains that there are circumstances where he thinks you can hide info or even deceive, there are circumstances where I think you can move against someone and oust them without advance notice. If a prospective CEO interviews me as a board member, I'm happy to tell them exactly under which circumstances I would give them advance notice (or things like second and third chances) and under which ones I wouldn't. (This is what reminded me of the essay and the dialogues with the Gestapo officer.) (That said, I'd decline the role because I'd probably have overdosed on anxiety medication if I had been in the OpenAI board's position.) 
The circumstances would have to be fairly extreme for me not to give advanced warnings or second chances, so if a CEO thinks I'm the sort of person who doesn't have a habit of interpreting lots of things in a black-and-white and uncharitable manner, then they wouldn't have anything to fear if they're planning on behaving well and are at least minimally skilled at trust-building/making themselves/their motives/reasons for actions transparent.

f) You say: 
"I think it is damaging to the trust people place in board members, to see them act with so little respect or honor. It reduces everyone's faith in one another to see people in powerful positions behave badly." 
I agree that it's damaging, but the way I see it, the problem here is the existence of psychopaths and other types of "bad actors" (or "malefactors"). They are why issues around trust and trustworthiness are sometimes so vexed and complicated. It would be wonderful if such phenotypes didn't exist, but we have to face reality. It doesn't actually help "the social fabric/fabric of trust" if one lends too much trust to people who abuse it to harm others and add more deception. On the contrary, it makes things worse. 

g) I appreciate what you say in the first paragraph of your point IV! I feel the same way about this. (I should probably have said this earlier in my reply, but I'm about to go to sleep and so don't want to re-alphabetize all of the points.) 

Comment by Lukas_Gloor on Dialogue on the Claim: "OpenAI's Firing of Sam Altman (And Shortly-Subsequent Events) On Net Reduced Existential Risk From AGI" · 2023-11-22T19:02:12.322Z · LW · GW

When I make an agreement to work closely with you on a crucial project,

I agree that there are versions of "agreeing to work closely together on the crucial project" where I see this as "speak up now or otherwise allow this person into your circle of trust." Once someone is in that circle, you cannot kick them out without notice just because you think you observed stuff that made you change your mind – if you could do that, it wouldn't work as a circle of trust.

So, there are circumstances where I'd agree with you. Whether the relationship between a board member and a CEO should be like that could be our crux here. I'd say yes in the ideal, but was it like that for the members of the board and Altman? I'd say it depends on the specific history. And my guess is that, no, there was no point where the board could have said "actually we're not yet sure we want to let Altman into our circle of trust, let's hold off on that." And there's no yes without the possibility of no

if I think you're deceiving me, I will let you know.

I agree that one needs to do this if one lost faith in people who once made it into one's circle of trust. However, let's assume they were never there to begin with. Then, it's highly unwise if you're dealing with someone without morals who feels zero obligations towards you in response. Don't give them an advance warning out of respect or a sense of moral obligation. If your mental model of the person is "this person will internally laugh at you for being stupid enough to give them advance warning and will gladly use the info you gave against you," then it would be foolish to tell them. Batman shouldn't tell the joker that he's coming for him.

I may move quickly to disable you if it's an especially extreme circumstance but I will acknowledge that this is a cost to our general cooperative norms where people are given space to respond even if I assign a decent chance to them behaving poorly.

What I meant to say in my initial comment is the same thing as you're saying here.

"Acknowledging the cost" is also an important thing in how I think about it, (edit) but I see that cost as not being towards the Joker (respect towards him), but towards the broader cooperative fabric. [Edit: deleted a passage here because it was long-winded.]

"If I assign a decent chance to them behaving poorly" – note that in my description, I spoke of practical certainty, not just "a decent chance that." Even in contexts where I think mutual expectations of trustworthiness and cooperativeness are lower than in what I call "circles of trust," I'm all in favor of preserving respect up until way past the point where you're just a bit suspicious of someone. It's just that, if the stakes are high and if you're not in a high-trust relationship with the person (i.e., you don't have a high prior that they're for sure cooperating back with you), there has to come a point where you'll stop giving them free information that could harm you. 

I admit this is a step in the direction of act utilitarianism, and act utilitarianism is a terrible, wrong ideology. However, I think it's only a step and not all the way, and there's IMO a way to codify rules/virtues where it's okay to take these steps and you don't get into a slippery slope. We can have a moral injunction where we'd only make such moves against other people if our confidence is significantly higher than it needs to be on mere act utilitarian grounds. Basically, you either need smoking-gun evidence of something sufficiently extreme, or need to get counsel from other people and see if they agree to filter out unilateralism in your judgment, or have other solutions/safety-checks like that before allowing yourself to act.

I think what further complicates the issue is that there are "malefactor types" who are genuinely concerned about doing the right thing and where it looks like they're capable of cooperating with people in their inner circle, but then they are too ready to make huge rationalization-induced updates (almost like "splitting" in BPD) that the other party was bad throughout all along and is now out of the circle. Their inner circle is way too fluid and their true circle of trust is only themselves. The existence of this phenotype means that if someone like that tries to follow the norms I just advocated, they will do harm. How do I incorporate this into my suggested policy? I feel like this is analogous to discussions about modest epistemology vs non-modest epistemology. What if you're someone who's deluded to think he's Napolean/some genius scientist? If someone is deluded like that, non-modest epistemology doesn't work. To this, I say "epistemology is only helpful if you're not already hopelessly deluded." Likewise, what if your psychology is hopelessly self-deceiving and you'll do on-net-harmful self-serving things even when you try your best not to do them? Well, sucks to be you (or rather, sucks for other people that you exist), but that doesn't mean that the people with a more trust-compatible psychology have to change the way they go about building a fabric of trust that importantly also has to be protected against invasion from malefactors.

I actually think it's a defensible position to say that the temptation to decide who is or isn't "trustworthy" is too big and humans need moral injunctions and that batman should give the joker an advance warning, so I'm not saying you're obviously wrong here, but I think my view is defensible as well, and I like it better than yours and I'll keep acting in accordance with it. (One reason I like it better is because if I trust you and you play "cooperate" with someone who only knows deception and who moves against you and your cooperation partners and destroys a ton of value, then I shouldn't have trusted you either. Being too undiscriminately cooperative makes you less trustworthy in a different sort of way.)

Shame on you for suggesting only your tribe knows or cares about honoring partnerships with people after you've lost trust in them. Other people know what's decent too.

I think there's something off about the way you express whatever you meant to express here – something about how you're importing your frame of things over mine and claim that I said something in the language of your frame, which makes it seem more obviously bad/"shameful" than if you expressed it under my frame. 

[Edit: November 22nd, 20:46 UK time. Oh, I get it now. You totally misunderstood what I meant here! I was criticizing EAs for doing this too naively. I was not praising the norms of my in-group (EA). Your reply actually confused me so much that I thought you were being snarky at me in some really strange way. Like, I thought you knew I was criticizing EAs. I guess you might identify as more of a rationalist than an EA, so I should have said "only EAs and rationalists" to avoid confusion. And like I say below, this was somewhat hyperbolic.]

In any case, I'd understand it if you said something like "shame on your for disclosing to the world that you think of trust in a way that makes you less trustworthy (according to my, Ben's, interpretation)." If that's what you had said, I'm now replying that I hope that you no longer think this after reading what I elaborated above.

Edit: And to address the part about "your tribe" – okay, I was being hyperbolic about only EAs having a tendency to be (what-I-consider-to-be) naive when it comes to applying norms of cooperation. It's probably also common in other high-trust ideological communities. I think it actually isn't very common in Silicon Valley, which very much supports my point here. When people get fired or backstabbed over startup drama (I'm thinking of the movie The Social Network), they are not given three months adjustment period where nothing really changes except that they now know what's coming. Instead, they have their privileges revoked and passwords changed and have to leave the building. I think focusing over how much notice someone has given is more a part of the power struggle and war over who has enough leverage to get others on their side, than it is genuinely about "this particular violation of niceness norms is so important that it deserves to be such a strong focus of this debate." Correspondingly, I think people would complain a lot less about how much notice was given if the board had done a better job convincing others that their concerns were fully justified. (Also, Altman himself certainly wasn't going to give Helen a lot of time still staying on the board and adjusting to the upcoming change, still talking to others about her views and participating in board stuff, etc., when he initially thought he could get rid of her.)

Comment by Lukas_Gloor on Dialogue on the Claim: "OpenAI's Firing of Sam Altman (And Shortly-Subsequent Events) On Net Reduced Existential Risk From AGI" · 2023-11-22T11:00:28.547Z · LW · GW

Maybe, yeah. Definitely strongly agree with not telling the staff a more complete story seems to be bad for both intrinsic and instrumental reasons. 

I'm a bit unsure how wise it would be to tip Altman off in advance given what we've seen he can mobilize in support of himself. 

And I think it's a thing that only EAs would think up that it's valuable to be cooperative towards people who you're convinced are deceptive/lack integrity. [Edit: You totally misunderstood what I meant here; I was criticizing them for doing this too naively. I was not praising the norms of my in-group. Your reply actually confused me so much that I thought you were being snarky in some really strange way.] Of course, they have to consider all the instrumental reasons for it, such as how it'll reflect on them if others don't share their assessment of the CEO lacking integrity. 

Comment by Lukas_Gloor on OpenAI: Facts from a Weekend · 2023-11-22T03:57:29.369Z · LW · GW

Hm, to add a bit more nuance, I think it's okay at a normal startup for a board to be comprised of people who are likely to almost always side with the CEO, as long as they are independent thinkers who could vote against the CEO if the CEO goes off the rails. So, it's understandable (or even good/necessary) for CEOs to care a lot about having "aligned" people on the board, as long as they don't just add people who never think for themselves.

It gets more complex in OpenAI's situation where there's more potential for tensions between CEO and the board. I mean, there shouldn't necessarily be any tensions, but Altman probably had less of a say over who the original board members were than a normal CEO at a normal startup, and some degree of "norms-compliant maneuvering" to retain board control feels understandable because any good CEO cares a great deal about how to run things. So, it actually gets a bit murky and has to be judged case-by-case. (E.g., I'm sure Altman feels like what happened vindicated him wanting to push Helen off the board.) 

Comment by Lukas_Gloor on OpenAI: Facts from a Weekend · 2023-11-22T03:24:45.109Z · LW · GW

Yeah, that makes sense and does explain most things, except that if I was Helen, I don't currently see why I wouldn't have just explained that part of the story early on?* Even so, I still think this sounds very plausible as part of the story.

*Maybe I'm wrong about how people would react to that sort of justification. Personally, I think the CEO messing with the board constitution to gain de facto ultimate power is clearly very bad and any good board needs to prevent that. I also believe that it's not a reason to remove a board member if they publish a piece of research that's critical of or indirectly harmful for your company. (Caveat that we're only reading a secondhand account of this, and maybe what actually happened would make Altman's reaction seem more understandable.) 

Comment by Lukas_Gloor on OpenAI: Facts from a Weekend · 2023-11-22T02:55:33.341Z · LW · GW

One thing I've realized more in the last 24h: 

  • It looks like Sam Altman is using a bunch of "tricks" now trying to fight his way back into more influence over OpenAI. I'm not aware of anything I'd consider unethical (at least if one has good reasons to believe one has been unfairly attacked), but it's still the sort of stuff that wouldn't come naturally to a lot of people and wouldn't feel fair to a lot of people (at least if there's a strong possibility that the other side is acting in good faith too).
  • Many OpenAI employees have large monetary incentives on the line and there's levels of peer pressure that are off the charts, so we really can't read too much into who tweeted how many hearts or signed the letter or whatever. 

Maybe the extent of this was obvious to most others, but for me, while I was aware that this was going on, I feel like I underestimated the extent of it. One thing that put things into a different light for me is this tweet

Which makes me wonder, could things really have gone down a lot differently? Sure, smoking-gun-type evidence would've helped the board immensely. But is it their fault that they don't have it? Not necessarily. If they had (1) time pressure (for one reason or another – hard to know at this point) and (2) if they still had enough 'soft' evidence to justify drastic actions. With (1) and (2) together, it could have made sense to risk intervening even without smoking-gun-type evidence.

(2) might be a crux for some people, but I believe that there are situations where it's legitimate for a group of people to become convinced that someone else is untrustworthy without being in a position to easily and quickly convince others. NDAs in play could be one reason, but also just "the evidence is of the sort that 'you had to be there'" or "you need all this other context and individual data points only become compelling if you also know about all these other data points that together help rule out innocuous/charitable interpretations about what happened."

In any case, many people highlighted the short notice with which the board announced their decision and commented that this implies that the board acted in an outrageous way and seems inexperienced. However, having seen what Altman managed to mobilize in just a couple of days, it's now obvious that, if you think he's scheming and deceptive in a genuinely bad way (as opposed to "someone knows how to fight power struggles and is willing to fight them when he feels like he's undeservedly under attack" – which isn't by itself a bad thing), then you simply can't give him a headstart. 

So, while I still think the board made mistakes, I today feel a bit less confident that these mistakes were necessarily as big as I initially thought. I now think it's possible – but far from certain – that we're in a world where things are playing out the way they have mostly because it's a really though situation for the board to be in even when they are right. And, sure, that would've been a reason to consider not starting this whole thing, but obviously that's very costly as well, so, again, tough situation.

I guess a big crux is "how common is it that you justifiably think someone is bad but it'll be hard to convince others?" My stance is that, if you're right, you should eventually be able to convince others if the others are interested in the truth and you get a bunch of time and the opportunity to talk to more people who may have extra info. But you might not be able to succeed if you only have a few days and then you're out if you don't sound convincing enough.

My opinions have been fluctuating a crazy amount recently (I don't think I've ever been in a situation where my opinions have gone up and down like this!), so, idk, I may update quite a bit in the other direction again tomorrow.

Comment by Lukas_Gloor on OpenAI: Facts from a Weekend · 2023-11-22T00:42:50.502Z · LW · GW

Having a "plan A" requires detailed advance-planning. I think it's much more likely that their decision was reactive rather than plan-based. They felt strongly that Altman had to go based on stuff that happened, and so they followed procedures – appoint an interim CEO and do a standard CEO search. Of course, it's plausible – I'd even say likely – that an "Anthropic merger" was on their mind as something that could happen as a result of this further down the line. But I doubt (and hope not) that this thought made a difference to their decision.

Reasoning:

  • If they had a detailed plan that was motivating their actions (as opposed to reacting to a new development and figuring out what to do as things go on), they would probably have put in a bit more time gathering more potentially incriminating evidence or trying to form social alliances. 
    For instance, even just, in the months or weeks before, visiting OpenAI and saying hi to employees, introducing themselves as the board, etc., would probably have improved staff's perception of how this went down. Similarly, gathering more evidence by, e.g., talking to people close to Altman but sympathetic to safety concerns, asking whether they feel heard in the company, etc, could have unearthed more ammunition. (It's interesting that even the safety-minded researchers at OpenAI basically sided with Altman here, or, at the very least, none of them came to the board's help speaking up against Altman on similar counts. [Though I guess it's hard to speak up "on similar counts" if people don't even really know their primary concerns apart from the vague "not always candid."])
  • If the thought of an Anthropic merge did play a large role in their decision-making (in the sense of "making the difference" to whether they act on something across many otherwise-similar counterfactuals), that would constitute a bad kind of scheming/plotting. People who scheme like that are probably less likely than baseline to underestimate power politics and the difficulty of ousting a charismatic leader, and more likely than baseline to prepare well for the fight. Like, if you think your actions are perfectly justified per your role as board member (i.e., if you see yourself as acting as a good board member), that's exactly the situation in which you're most likely to overlook the possibility that Altman may just go "fuck the board!" and ignore your claim to legitimacy. By contrast, if you're kind of aware that you're scheming and using the fact that you're a board member merely opportunistically, it might more readily cross your mind that Altman might scheme back at you and use the fact that he knows everyone at the company and has a great reputation in the Valley at large.
  • It seems like the story feels overall more coherent if the board perceived themselves to be acting under some sort of time-pressure (I put maybe 75% on this).
    • Maybe they felt really anxious or uncomfortable with the 'knowledge' or 'near-certainty' (as it must have felt to them, if they were acting as good board members) that Altman is a bad leader, so they sped things up because it was psychologically straining to deal with the uncertain situation.
    • Maybe Altman approaching investors made them worry that if he succeeds, he'd acquire too much leverage.
    • Maybe Ilya approached them with something and prompted them to react to it and do something, and in the heat of the moment, they didn't realize that it might be wise to pause and think things through and see if Ilya's mood is a stable one.
    • Maybe there was a capabilities breakthrough and the board and Ilya were worried the new system may not be safe enough especially considering that once the weights leak, people anywhere on the internet can tinker with the thing and improve it with tweaks and tricks. 
    • [Many other possibilities I'm not thinking of.]
    • [Update – I posted this update before gwern's comment but didn't realize it's that it's waaay more likely to be the case than the other ones before he said it] I read a rumor in a new article about talks about how to replace another board member, so maybe there was time pressure before Altman and Brockman would appoint a new board member who would always side with them. 

were surprised when he rejected them

I feel like you're not really putting yourself into the shoes of the board members if you think they were surprised by the time where they asked around for CEOs that someone like Dario (with the reputation of his entire company at risk) would reject them. At that point, the whole situation was such a mess that they must have felt extremely bad and desperate going around frantically asking for someone to come in and help save the day. (But probably you just phrased it like that because you suspect that, in their initial plan where Altman just accepts defeat, their replacement CEO search would go over smoothly. That makes sense to me conditional on them having formed such a detailed-but-naive "plan A.")

Edit: I feel confident in my stance but not massively so, so I reserve maybe 14% to a hypothesis that is more like the one you suggested, partly updating towards habryka's cynicism, which I unfortunately think has had a somewhat good track record recently.

Comment by Lukas_Gloor on OpenAI: Facts from a Weekend · 2023-11-20T18:13:47.149Z · LW · GW

Yeah but if this is the case, I'd have liked to see a bit more balance than just retweeting the tribal-affiliation slogan ("OpenAI is nothing without its people") and saying that the board should resign (or, in Ilya's case, implying that he regrets and denounces everything he initially stood for together with the board). Like, I think it's a defensible take to think that the board should resign after how things went down, but the board was probably pointing to some real concerns that won't get addressed at all if the pendulum now swings way too much in the opposite direction, so I would have at least hoped for something like "the board should resign, but here are some things that I think they had a point about, which I'd like to see to not get shrugged under the carpet after the counter-revolution."

Comment by Lukas_Gloor on Sam Altman fired from OpenAI · 2023-11-20T11:00:04.601Z · LW · GW

It was anyway weird that they had LeCun in charge and a thing called "Responsible AI team" in the same company. No matter what one thinks about Sam Altman now, compared to LeCun, the things he said about AI risks sounded 100 times more reasonable.

Comment by Lukas_Gloor on Integrity in AI Governance and Advocacy · 2023-11-20T01:51:21.407Z · LW · GW

Okay, that's fair.

FWIW, I think it's likely that they thought about this decision for quite some time and systematically – I mean the initial announcement did mention something about a "deliberative review process by the board." But yeah, we don't get to see any of what they thought about or who (if anyone) they consulted for gathering further evidence or for verifying claims by Sutskever. Unfortunately, we don't know yet. And I concede that given the little info we have, it takes charitable priors to end up with "my view." (I put it in quotation marks because it's not like I have more than 50% confidence in it. Mostly, I want to flag that this view is still very much on the table.) 

Also, on the part about "imply that Sam had done some pretty serious deception, without anything to back that up with." I'm >75% that either Eliezer nailed it in this tweet, or they actually have evidence about something pretty serious but decided not to disclose it for reasons that have to do with the nature of the thing that happened. (I guess the third option is they self-deceived into thinking their reasons to fire Altman will seem serious/compelling [or at least defensible] to everyone to whom they give more info, when in fact the reasoning is more subtle/subjective/depends on additional assumptions that many others wouldn't share. This could then have become apparent to them when they had to explain their reasoning to OpenAI staff later on, and they aborted the attempt in the middle of it when they noticed it wasn't hitting well, leaving the other party confused. I don't think that would necessarily imply anything bad about the board members' character, though it is worth noting that if someone self-deceives in that way too strongly or too often, it makes for a common malefactor pattern, and obviously it wouldn't reflect well on their judgment in this specific instance. One reason I consider this hypothesis less likely than the others is because it's rare for several people – the four board members – to all make the same mistake about whether their reasoning will seem compelling to others, and for none of them to realize that it's better to err on the side of caution and instead say something like "we noticed we have strong differences in vision with Sam Altman," or something like that.)

Comment by Lukas_Gloor on Integrity in AI Governance and Advocacy · 2023-11-20T00:36:35.257Z · LW · GW

[...] reputational trade (OpenAI got to hire a bunch of talent from EA spaces and make themselves look responsible to the world) [...]

Yes, I think "reputational trade," i.e., something that's beneficial for both parties, is an important part of the story that the media hasn't really picked up on. EAs were focused on the dangers and benefits from AI way before anyone else, so it carries quite some weight when EA opinion leaders put an implicit seal of approval on the new AI company. 

There's a tension between 
(1) previously having held back on natural-seeming criticism of OpenAI ("putting the world at risk for profits" or "they plan on wielding this immense power of building god/single-handedly starting something bigger than the next Industrial Revolution/making all jobs obsolete and solving all major problems") because they have the seal of approval from this public good, non-profit, beneficial-mission-focused board structure, 

and 

(2) being outraged when this board structure does something that it was arguably intended to do (at least under some circumstances).

(Of course, the specifics of how and why things happened matter a lot, and maybe most people aren't outraged because the board did something, but rather because of how they did it or based on skepticism about reasons and justifications. On those later points, I sympathize more with people who are outraged or concerned that something didn't go right. But we don't know all the details yet.)

Comment by Lukas_Gloor on Altman firing retaliation incoming? · 2023-11-19T14:50:04.850Z · LW · GW

That's a good point in theory, but you'd think the board members can also speak to the media and say "no they are lying, we haven't so far had these talks with them that they claimed to have had with us about reverting the decision." 

[Update Nov 20th, seems like you were right and the board maybe didn't have good enough media connections to react to this, or were too busy fighting their fight on another front.]

Comment by Lukas_Gloor on Altman firing retaliation incoming? · 2023-11-19T14:23:33.393Z · LW · GW

Maybe I'm wrong about how all of this will be painted by the media, and the public / government's perceptions

So far, a lot of the media coverage has framed the issue so the board comes across as inexperienced and their mission-related concerns come across more like an overreaction rather than reasonable criticism of profit-oriented or PR-oriented company decisions putting safety at risk when building the most dangerous technology.

I suspect that this is mostly a function of how things went down, how these differences of vision between them and Altman came into focus, rather than an a feature of the current discourse window – we've seen that media coverage about AI risk concerns (and public reaction to it) isn't always negative. So, I think you're right that there are alternative circumstances where it would look quite problematic for Microsoft if they try to interfere with or circumvent the non-profit board structure. Unfortunately, it might be a bit late to change the framing now. 

But there's still the chance that the board is sitting on more info and struggled to coordinate their communications amidst all the turmoil so far, or have other reasons for not explaining their side of things in a more compelling manner.

(My comment is operating under the assumption that it's indeed true that Altman isn't the sort of cautious good leader that one would want for the whole AI thing to go well. I personally think this might well be the case, but I want to flag that my views here aren't very resilient because I have little info, and also I'm acknowledging that seeing outpouring of support for him is at least moderate evidence of him being a good leader [but one should also be careful about not overupdating on this type of evidence of someone being well-liked in a professional network]. And by "good leader" I don't just mean "can make money for companies" – Elon Musk is also good at making money for companies, but I'm not sure people would come to his support in the same way they came to Altman's support, for instance. Also, I think "being a good leader" is way more important than "having the right view on AI risk," because who knows for sure what the exact right view is – the important thing is that a good leader will incrementally make sizeable updates in the right direction as more information comes in through the research landscape.) 

Comment by Lukas_Gloor on Integrity in AI Governance and Advocacy · 2023-11-19T02:54:31.997Z · LW · GW

As you'd probably agree with, it's plausible that Sutskever was able to convince the board about specific concerns based on his understanding of the technology (risk levels and timelines) or his day-to-day experience at OpenAI and direct interactions with Sam Altman. If that's what happened, then it wouldn't be fair that any EA-minded board members just acted in an ideologically-driven way. (Worth pointing out for people who don't know this that Sutskever has no ties to EA; it just seems like he shares concerns about the dangers from AI.)

But let's assume that it comes out that EA board members played a really significant role or were even thinking about something like this before Sutskever brought up concerns. "Play of power" evokes connotations of opportunism and there being no legitimacy for the decision other than that the board thought they could get away with it. This sort of concern you're describing would worry me a whole lot more if OpenAI had a typical board and corporate structure.

However, since they have a legal structure and mission that emphasizes benefitting humanity as a whole and not shareholders, I'd say situations like the one here are (in theory) exactly why the board was set up that way. The board's primary task is overseeing the CEO. To achieve OpenAI's mission, the CEO needs to have the type of personality and thinking habits so he will likely converge toward whatever the best-informed views are about AI risks (and benefits) and how to mitigate (and actualize) them. The CEO shouldn't be someone who is unlikely to engage in the sort of cognition that one would perform if one cared greatly about long-run outcomes rather than near-term status and took seriously the chance of being wrong about one's AI risk and timeline assumptions. Regardless of what's actually true about Altman, it seems like the board came to a negative conclusion about his suitability. In terms of how they made this update, we can envision some different scenarios, some of them would seem unfair to Altman and "ideology-driven" in a sinister way, while others would seem legitimate. (The following scenarios will take for granted that the thing that happened had elements of a "AI safety coup," as opposed to a "Sutskever coup" or "something else entirely." Again, I'm not saying that any of this is confirmed; I'm just going with the hypothesis where the EA involvement has the most potential for controversy.) So, here are three variants of how the board could have updated that Altman is not suitable for the mission:

(1) The responsible board members (could just be a subset of the ones that voted against Altman rather than all four of them) never gave him much of a chance. They learned that Altman is less concerned about AI notkilleveryoneism than they would've liked, so they took an opportunity to try to oust him. (This is bad because it's ideology-driven rather than truth-seeking.) 

(2) The responsible board members did give Altman a chance initially, but he deceived them in a smoking-gun-type breach of trust.

(3) The responsible board members did gave Altman a chance initially, but they became increasingly disillusioned through a more insincere-vibes-based and gradual erosion of trust, perhaps accompanied by disappointments from empty promises/assurances about, e.g., taking safety testing more seriously for future models, avoiding racing dynamics/avoiding giving out too much info on how to speed up AI through commercialization/rollouts, etc. (I'm only speculating here with the examples I'm giving, but the point is that if the board is unusually active about looking into stuff, it's conceivable that they maybe-justifiably reached this sort of update even without any smoking-gun-type breach of trust.) 

Needless to say, (1) would be very bad board behavior and would put EA in a bad light. (2) would be standard stuff about what boards are there for, but seems somewhat unlikely to have happened here based on the board not being able to easily give more info to the public about what Altman did wrong (as well as the impression I get that they don't hold much leverage in the negotations now). (3) seems most likely to me and also quite complex to make judgments about the specifics, because lots of things can fall into (3). (3) requires an unusually "active/observant" board. This isn't necessarily bad. I basically want to flag that I see lots of (3)-type scenarios where the board acted with integrity and courage, but also (admittedly) probably displayed some inexperience by not preparing for the power struggle that results after a decision like this, and by (possibly?) massively mishandling communications, using wording that may perfectly describe what happened when the description is taken literally, but is very misleading when we apply the norms about how parting ways announcements are normally written in very tactful corporate speak. (See also Eliezer's comment here.) Alternatively, it's also possible that a (3)-type scenario happened, but the specific incremental updates were uncharitable towards Altman due to being tempted by "staging a coup," or stuff like that. It gets messy when you have to evaluate someone's leadership fit where they have a bunch of uncontested talents but also some orange flags and you have decide what sort of strengths or weaknesses are most essential for the mission.

Comment by Lukas_Gloor on Integrity in AI Governance and Advocacy · 2023-11-18T21:30:20.444Z · LW · GW

Like, I am quite worried that we will end up with some McCarthy-esque immune reaction to EA people in the US and the UK government where people will be like "wait, what the fuck, how did it happen that this weirdly intense social group with strong shared ideology is now suddenly having such an enormous amount of power in government? Wow, I need to kill this thing with fire, because I don't even know how to track where it is, or who is involved, so paranoia is really the only option". 

This is looking increasingly prescient. 

[Edit to add context]

Not saying this is happening now, but after the board decisions at OpenAI, I could imagine more people taking notice. Hopefully the sentiment then will just be open discourse and acknowledging that there's now this interesting ideology besides partisan politics and other kinds of lobbying/influence-seeking that are already commonplace. But to get there, I think it's plausible that EA has some communications- and maybe trust-building work to do. 

Comment by Lukas_Gloor on Social Dark Matter · 2023-11-17T11:39:58.772Z · LW · GW

Many people are psychopaths, but most psychopaths do not lack empathy... they just disagree with some effective altruist ideas?

Lacking affective empathy is arguably one of the most defining characteristics of psychopathy, so I don't think this is a good example.

Instead, think of all the peripheral things that people associate with psychopathy and that probably do correlate with it (and often serve as particularly salient examples or cause people to get "unmasked"), but may not always go together. Take those away.

For instance, 

  • Sadism – you can have psychopaths who are no more sadistic than your average person. (Many may still act more sadistically on average because they don't have prosocial emotions counterbalancing the low-grade sadistic impulses that you'd also find in lots of neurotypical people.)
  • Lack of conscientiousness and "parasitic lifestyle" – some psychopaths have self-control and may even be highly conscientious. See the claims about how, in high-earning professions that reward ruthlessness (e.g., banking, some types of law, some types of management), a surprisingly high percentage of high-performers have psychopathic traits.
  • Disinterested in altruism of any sort – some psychopaths may be genuinely into EA but may be tempted to implement it more SBF-style. (See also my comment here.)
  • Obsessed with social standing/power/interpersonal competition. Some may focus on excelling at a non-social hobby ("competition with nature") that provides excitement that would otherwise be missing in an emotionally dulled life (something about "shallow emotions" is IMO another central characteristic of psychopathy, though there's probably more specificity to it). E.g., I wouldn't be shocked if the free-solo climber Alex Honnold were on some kind of "psychopathy spectrum," but I might be wrong (and if he was, I'd still remain a fan). I'm only speculating based on things like "they measured his amygdala/-activation and it was super small."
Comment by Lukas_Gloor on Integrity in AI Governance and Advocacy · 2023-11-04T01:20:56.556Z · LW · GW

Interesting discussion! 

Probably somewhat controversially, but I've been kind of happy about the Politico pieces that have been published. We had two that basically tried to make the case there is an EA conspiracy in DC that has lots of power in a kind of unaccountable way.

Maybe someone could reach out to the author and be like "Ok, yeah, we are kind of a bit conspiratorial, sorry about that. But I think let's try to come clean, I will tell you all the stuff that I know, and you take seriously the hypothesis that we really aren't doing this to profit off of AI, but because we are genuinely concerned about catastrophic risks from AI".

I like that you imagine conversations like that in your head and that they sometimes go well there! 

Seems important to select the right journalist if someone were to try this. I feel like the journalist would have to be sympathetic already or at least be a very reasonable and fair-minded person. Unfortunately, some journalists cannot think straight for the life of them and only make jumpy, shallow associations like "seeks influence, so surely this person must be selfish and greedy."

I didn't read the Politico article yet, but given that "altruism" is literally in the name with "EA," I wonder why it needs to be said "and you take seriously the hypothesis that we really aren't doing this to profit off of AI." If a journalist is worth his or her salt, and they write about a movement called EA, shouldn't a bunch of their attention go into the question of why/whether some of these people might be genuine? And if the article takes a different spin and they never even consider that question, doesn't it suggest something is off? (Again, haven't read the article yet – maybe they do consider it or at least leave it open.)

Comment by Lukas_Gloor on Book Review: Going Infinite · 2023-10-25T20:05:44.246Z · LW · GW

Obviously I agree with this. I find it strange that you would take me to be disagreeing with this and defending some sort of pure pleasure version of utilitarianism. What I said was that I care about "meaning, fulfillment, love"—not just suffering, and not just pleasure either.

That seems like a misunderstanding – I didn't mean to be saying anything about your particular views!

I only brought up classical hedonistic utilitarianism because it's a view that many EAs still place a lot of credence on (it seems more popular than negative utilitarianism?). Your comment seemed to me to be unfairly singling out something about (strongly/exclusively) suffering-focused ethics. I wanted to point out that there are other EA-held views (not yours) where the same criticism applies the same or (arguably) even more.

Comment by Lukas_Gloor on Book Review: Going Infinite · 2023-10-25T14:54:06.376Z · LW · GW

Great review and summary! 

I followed the aftermath of FTX and the trial quite closely and I agree with your takes. 

Also +1 to mentioning the suspiciousness around Alameda's dealings with tether. It's weird that this doesn't get talked about much, so far.

On the parts of your post that contain criticism of EA: 

We are taking many of the brightest young people. We are telling them to orient themselves as utility maximizers with scope sensitivity, willing to deploy instrumental convergence. Taught by modern overprotective society to look for rules they can follow so that they can be blameless good people, they are offered a set of rules that tells them to plan their whole lives around sacrifices on an alter, with no limit to the demand for such sacrifices. And then, in addition to telling them to in turn recruit more people to and raise more money for the cause, we point them into the places they can earn the best ‘career capital’ or money or ‘do the most good,’ which more often than not have structures that systematically destroy these people’s souls. 
SBF was a special case. He among other things, and in his own words, did not have a soul to begin with. But various versions of this sort of thing are going to keep happening, if we do not learn to ground ourselves in real (virtue?!) ethics, in love of the world and its people.

[...]

Was there a reckoning, a post-mortem, an update, for those who need one? Somewhat. Not anything like enough. There was a rush to deontology that died away quickly, mostly retreating back into its special enclave of veganism. There were general recriminations. There were lots of explicit statements that no, of course we did not mean that and of course we do not endorse any of that, no one should be doing any of that. And yes, I think everyone means it. But it’s based on, essentially, unprincipled hacks on top of the system, rather than fixing the root problem, and the smartest kids in the world are going to keep noticing this. We need to instead dig into the root causes, to design systems and find ways of being that do not need such hacks, while still preserving what makes such real efforts to seek truth and change the world for the better special in the first place.

Interesting take! I'm curious to follow the discussion around this that your post inspired. 

I wish someone who is much better than me at writing things up in an intelligible and convincing fashion would make a post with some of the points I made here. In particular, I would like to see more EAs acknowledge that longtermism isn't true in any direct sense, but rather, that it's indirectly about the preferences of us as altruists (see the section, "Caring about the future: a flowchart"). Relatedly, EAs would probably be less fanatic about their particular brand of maximizing morality if they agreed that "What's the right maximizing morality?" has several defensible answers, so those of us who make maximizing morality a part of their life goals shouldn't feel like they'd have moral realism on their side when they consider overruling other people's life goals. Respecting other people's life goals, even if they don't agree with your maximizing morality, is an ethical principle that's at least as compelling/justified from a universalizing, altruistic stance, than any particular brand of maximizing consequentialism.

Comment by Lukas_Gloor on Book Review: Going Infinite · 2023-10-25T14:17:54.839Z · LW · GW

I also like the quote. I consider meaning and fulfillment of life goals morally important, so I'm against one-dimensional approaches to ethics.

However, I think it's a bit unfair that just because the quote talks about suffering (and not pleasure/positive experience), you then go on to talk exclusively about suffering-focused ethics.

Firstly, "suffering-focused ethics" is an umbrella term that encompasses several moral views, including very much pluralistic ones (see the start of the Wikipedia article or the start of this initial post).

Second, even if (as I do from here on) we assume that you're talking about "exclusively suffering-focused views/axiologies," which I concede make up a somewhat common minority of views in EA at large and among suffering-focused views in particular, I'd like to point out that the same criticism (of "map-and-territory confusion") applies just as much, if not more strongly, against classical hedonistic utilitarian views. I would also argue that classical hedonistic utilitarianism has had, at least historically, more influence among EAs and that it describes better where SBF himself was coming from (not that we should give much weight to this last bit).

To elaborate, I would say the "failure" (if we want to call it that) of exclusively suffering-focused axiologies is incompleteness rather than mistakenly reifying a proxy metric for its intended target. (Whereas the "failure" of classical hedonism is, IMO, also the latter.) I think suffering really is one of the right altruistic metrics.

The IMO best answer to "What constitutes (morally relevant) suffering?" is something that's always important to the being that suffers. I.e., suffering is always bad (or, in its weakest forms, suboptimal) from the perspective of the being that suffers. I would define suffering as an experienced need to change something about one's current experience. (Or end said experience, in the case of extreme suffering.)

(Of course, not everyone who subscribes to a form of suffering-focused ethics would see it that way – e.g., people who see the experience of pain asymbolia as equally morally disvaluable as what we ordinary call "pain" have a different conception of suffering. Similarly, I'm not sure whether Brian Tomasik's pan-everythingism about everything would give the same line of reasoning as I would for caring a little about "electron suffering," or whether this case is so different and unusual that we have to see it as essentially a different concept.) 

And, yeah, bringing to our mind the distinction between map and territory, when we focus on the suffering beings and not the suffering itself, we can see that there are some sentient beings ("moral persons" according to Singer) to whom things other than their experiences can be important.

Still, I think the charge "you confuse the map for the territory, the measure for the man, the math with reality" sticks much better against classical hedonistic utilitarianism. After all, take the classical utilitarian's claim "pleasure is good." I've written about this in a short form on the EA forum.  As I would summarize it now, when we talk about "pleasure is good," there are two interpretations behind this that can be used for motte-and-bailey. I will label these two claims "uncontroversial" and "controversial." Note how the uncontroversial claim has only vague implications, whereas the controversial one has huge and precise implications (maximizing hedonist axiology). 

(1) Uncontroversial claim: When we say that pleasure is good, we mean that all else equal, pleasure is always unobjectionable, and often it is what we higher-order desire.

This uncontroversial claim is compatible with "other things also matter morally."

(For comparison, the uncontroversial interpretation for "suffering is bad" is "all else equal, suffering is always [at least a bit] objectionable, and often something we higher-order desire against.")

(2) Controversial claim: When we say that pleasure is good, what we mean is that we ought to be personal hedonist maximizers. This includes claims like "all else equal, more pleasure is always better than less pleasure," among a bunch of other things.

"All else equal, more pleasure is always better than less pleasure" seems false. At the very least, it's really controversial (that's why it's not part of the the uncontroversial claim, where it just says "pleasure is always unobjectionable.") 

When I'm cozily in bed half-asleep and cuddled up next to my soulmate and I'm feeling perfectly fulfilled in life in this moment, the fact that my brain's molecules aren't being used to generate even more hedons is not a problem whatsoever. 

By contrast, "all else equal, more suffering is always worse than less suffering" seems to check out – that's part of the uncontroversial interpretation of "suffering is bad." 

So, "more suffering is always worse" is uncontroversial, while "more intensity of positive experience is always better (in a sense that matters morally and is worth tradeoffs)" is controversial. 

That's why I said the following earlier on in my comment here: 

I would say the "failure" (if we want to call it that) of exclusively suffering-focused axiologies is incompleteness rather than mistakenly reifying a proxy metric for its intended target. (Whereas the "failure" of classical hedonism is, IMO, also the latter.) I think suffering really is one of the right altruistic metrics.

But "maximize hedons" isn't. 

The point to notice for proponents of an exclusively suffering-focused axiology is that humans have two motivational systems, not just the system-1 motivation that I see as being largely about the prevention of short-term cravings/suffering. Next to that, there's also also higher-order, "reflective" desires. These reflective desires are often (though not in everyone) about (specific forms of) happiness or things other than experiences (or, as a perhaps better way to express this, they are also about how specific experiences are embedded in the world, their contact with reality.) 

Comment by Lukas_Gloor on Related Discussion from Thomas Kwa's MIRI Research Experience · 2023-10-09T13:37:40.087Z · LW · GW

Woah! That's like 10x more effort than I expect >90% of difficult-to-communicate-with people will go through. 

Kudos to Nate for that.

There are things that I really like about the document, but I feel like I'd need to know more about its reason for being created to say whether this deserves kudos.

It seems plausible that the story went something like this: "Nate had so much social standing that he was allowed/enabled to do what most 'difficult to interact with' people couldn't, namely to continue in their mannerisms without making large changes, and still not suffer from a reduction of social standing. Partly to make this solution palatable to others and to proactively address future PR risks and instances of making people sad (since everyone already expected/was planning for more such instances to come up), Nate wrote this document."

If an org is going to have this sort of approach to its most senior researcher, it's still better to do it with a document of this nature than without.

But is this overall a great setup and strategy? I'm doubtful. (Not just for the org as a whole, but also long-term for Nate himself.)

Comment by Lukas_Gloor on Related Discussion from Thomas Kwa's MIRI Research Experience · 2023-10-09T09:09:30.437Z · LW · GW

But to clarify, this is not the reason why I 'might be concerned about bad incentives in this case', if you were wondering. 

Sounds like I misinterpreted the motivation behind your original comment! 

I ran out of energy to continue this thread/conversation, but feel free to clarify what you meant for others (if you think it isn't already clear enough for most readers).

Comment by Lukas_Gloor on Related Discussion from Thomas Kwa's MIRI Research Experience · 2023-10-09T01:51:29.413Z · LW · GW

I made an edit to my above comment to address your question; it's probably confusing that I used quotation marks for something that wasn't a direct quote by anyone.

Comment by Lukas_Gloor on Sam Altman's sister, Annie Altman, claims Sam has severely abused her · 2023-10-09T01:19:29.101Z · LW · GW

One benefit of boosting the visibility of accusations like this is that it makes it easier for others to come forward as well, should there be a pattern with other abuse victims. Or even just other people possibly having had highly concerning experiences of a non-sexual but still interpersonally exploitative nature.

If this doesn't happen, it's probabilistic evidence against the worst tail scenarios of character traits, which would be helpful if we could significantly discount that.

It's frustrating that we may never know, but one way to think about this is "we'd at least want to find out the truth in the worlds where it's easy to find out." 

Comment by Lukas_Gloor on Related Discussion from Thomas Kwa's MIRI Research Experience · 2023-10-08T23:05:09.927Z · LW · GW

I wasn't the one who downvoted your reply (seems fair to ask for clarifications), but I don't want to spend much more time on this and writing summaries isn't my strength. Here's a crude attempt at saying the same thing in fewer and different words:

IMO, there's nothing particularly "antithetical to LW aims/LW culture" (edit: "antithetical to LW aims/LW culture" is not a direct quote by anyone; but it's my summary interpretation of why you might be concerned about bad incentives in this case) about neuroticism-related "shortcomings." "Shortcomings" compared to a robotic ideal of perfect instrumental rationality. By "neuroticism-related "shortcomings"", I mean things like having triggers or being unusually affected by harsh criticism. It's therefore weird and a bit unfair to single out such neuroticism-related "shortcomings" over things like "being in bad shape" or "not being good at common life skills like driving a car." (I'm guessing that you wouldn't be similarly concerned about setting bad incentives if someone admitted that they were bad at driving cars or weren't in the best shape.) I'm only guessing here, but I wonder about rationalist signalling cascades about the virtues of rationality, where it gets rewarded to be particularly critical about things that least correspond to the image of what an ideally rational robot would be like. However, in reality, applied rationality isn't about getting close to some ideal image. Instead, it's about making the best out of what you have, taking the best next move step-by-step for your specific situation, always prioritizing what actually gets you to your goals rather than prioritizing "how do I look as though I'm very rational."

Not to mention that high emotionality confers advantages in many situations and isn't just an all-out negative. (See also TurnTrout's comment about rejecting the framing that this is an issue of his instrumental rationality being at fault.)

Comment by Lukas_Gloor on Related Discussion from Thomas Kwa's MIRI Research Experience · 2023-10-08T20:28:58.870Z · LW · GW

[...] I was about to upvote due to the courage required to post it publicly and stand behind it. But then I stopped and thought about the long term effects and it's probably best not to encourage this. [...] As ideally, you, along with the vast majority of potential readers, should become less emotionally reactive over time to any real or perceived insults, slights, etc...

It seems weird to single out this specific type of human limitation (compared to perfect-robot instrumental rationality) over the hundreds of others. If someone isn't in top physical shape or cannot drive cars under difficult circumstances or didn't renew their glasses and therefore doesn't see optimally, would you also be reluctant to upvote comments you were otherwise tempted to upvote (where they bravely disclose some limitation) because of this worry about poor incentives? "Ideally," in a world where there's infinite time so there are no tradeoffs for spending self-improvement energy, rationalists would all be in shape, have brushed up their driving skills, have their glasses updated, etc. In reality, it's perfectly fine/rational to deprioritize many things that are "good to have" because other issues are more pressing, more immediately deserving of self-improvement energy. (Not to mention that rationality for its own sake is lame anyway and so many of us actually want to do object-level work towards a better future.) What to best focus on with self-improvement energy will differ a lot from person to person, not only because people have different strengths and weaknesses, but also because they operate in different environments. (E.g., in some environments, one has to deal with rude people all the time, whereas in others, this may be a rare occurrence.) For all these reasons, it seems weirdly patronizing to try to shape other people's prioritization for investing self-improvement energy. This isn't to say that this site/community shouldn't have norms and corresponding virtues and vices. Since LW is about truth-seeking, it makes sense to promote virtues directly related to truth-seeking, e.g., by downvoting comments that exhibit poor epistemic practices. However, my point is that even though it might be tempting to discourage not just poor epistemic rationality but also poor instrumental rationality, these two work very differently, especially as far as optimal incentive-setting is concerned. Epistemic rationality is an ideal we can more easily enforce and get closer towards. Instrumental rationality, by contrast, is a giant jungle that people are coming into from all kinds of different directions. "Having unusually distracting emotional reactions to situations xyz" is one example of suboptimal instrumental rationality, but so is "being in poor physical shape,"or "not being able to drive a car," or "not having your glasses updated," etc. I don't think it makes sense for the community to create a hierarchy of "most important facets of instrumental rationality" that's supposed to apply equally to all kinds of people. (Instead, I think it makes more sense to reward meta-skills of instrumental rationality, such as "try to figure out what your biggest problems are and really prioritize working on them.) (If we want to pass direct judgment on someone's prioritization of self-improvement energy, we need to know their exact situation and goals and the limitations they have, how good they are at learning various things, etc.) Not to mention the unwelcoming effects when people get judged for limitations of instrumental rationality that the community for some reason perceives to be particularly bad. Such things are always more personal (and therefore more unfair) than judging someone for having made a clear error of reasoning (epistemic rationality).

(I say all of this as though it's indeed "very uncommon" to feel strongly hurt and lastingly affected by particularly harsh criticism. I don't even necessarily think that this is the case: If the criticism comes from a person with high standing in a community one cares about, it seems like a potentially quite common reaction?) 

Comment by Lukas_Gloor on Sam Altman's sister, Annie Altman, claims Sam has severely abused her · 2023-10-08T12:26:08.686Z · LW · GW

Strongly agree! 

I have mixed feelings about the convincingness of the accusations. Some aspects seem quite convincing to me, others very much not.

In most contexts, I'm still going to advocate for treating Sam Altman as though it's 100% that he's innocent, because that's what I think is the right policy in light of uncorroborated accusations. However, in the context of "should I look into this more or at least keep an eye on the possibility of dark triad psychology?," I definitely think this passes the bar of "yes, this is relevant to know."

I thought it was very strange to interpret this post as "gossip," as one commenter did.

Comment by Lukas_Gloor on Sam Altman's sister, Annie Altman, claims Sam has severely abused her · 2023-10-08T12:00:48.722Z · LW · GW

Note often time children don’t process sexual assault as an incredibly traumatic until years later.

The opposite is common, though. I know someone who had this happened and they remembered that sexual assault felt distinctly very bad even before knowing what sex was. (And see my other comment on resurfacing memories.) 

Comment by Lukas_Gloor on Sam Altman's sister, Annie Altman, claims Sam has severely abused her · 2023-10-08T11:56:04.672Z · LW · GW

I know someone who recovered memories of repeated abuse including from the age of four later in their teenage years. The parents could corroborate a lot of circumstances around those memories, which suggests that they're likely broadly accurate. For instance, things like "they told their mother about the abuse when they were four, and the mother remembered that this conversation happened." Or "the parents spoke to the abuser and he basically admitted it." There was also suicidal ideation at around age six (similarity to Annie's story). In addition, the person remembers things like, when playing with children's toy figures (human-like animals), they would not play with these toy figures like ordinary children and instead think about plots that involve bleeding between legs and sexual assault. (This is much more detailed than Annie’s story, but remembering panic attacks as the first memory and having them as a child at least seems like evidence that she was strongly affected by something that had happened.)

Note that the person in question recovered these memories alone years before having any therapy.

It's probably easier to remember abuse (or for this to manifest itself in child behavior in detailed ways, like with the toy figures) when it's repeated. I think there’s a bunch of interpersonal variation also with respect to how people react to trauma. According to selfdecode (a service like 23andme), the person has alleles that make them unusually resilient to trauma, and yet they still struggled with cPTSD symptoms and the memories weren't always accessible.

Comment by Lukas_Gloor on Making AIs less likely to be spiteful · 2023-09-27T08:07:02.871Z · LW · GW

Great post; I suspect this to be one of the most tractable areas for reducing s-risks! 

I also like the appendix. Perhaps this is too obvious to state explicitly, but I think one reason why spite in AIs seems like a real concern is because it did evolve somewhat prominently in (some) humans. (And I'm sympathetic to the shard theory approach to understanding AI motivations, so it seems to me that the human evolutionary example is relevant. (That said, it's only analogous if there's a multi-agent-competition phase in the training, which isn't the case with LLM training so far, for instance.))

Comment by Lukas_Gloor on Sharing Information About Nonlinear · 2023-09-12T14:23:50.435Z · LW · GW

Practically, third parties who learn about an accusation will often have significant uncertainty about its accuracy. So, as a third party seeing Ben (or anyone else) make a highly critical post, I guess I could remain agnostic until the truth comes out one way or another, and reward/punish Ben at that point. That's certainly an option. Or, I could try to have some kind of bar of "how reasonable/unreasonable does an accusation need to seem to be defensible, praiseworthy, or out of line?" It's a tough continuum and you'll have communities that are too susceptible to witch hunts but also ones where people tend to play things down/placate over disharmony.

Comment by Lukas_Gloor on A non-magical explanation of Jeffrey Epstein · 2023-09-11T13:04:33.211Z · LW · GW

Perhaps there is some sort of guideline preventing him from speaking about this, but I have not heard of it. District Attorneys and the FBI publicly announce people were informants all of the time, as long as the people they're prosecuting are already prosecuted. They certainly don't swear an oath not to comment on the subject even in the event of the persons' death.


Yeah, this part seems odd to me. As Acosta, wouldn't you want to tell everyone unambiguously as soon as things are over that you had to "follow orders from high up"? Acosta looks like one of the worst people on the planet in this story. If he could use the excuse of following orders, instead of just being bribed or intimidated somehow, then he'd look slightly less bad? So, once rumours about intelligence agency involvement were started, it seems like it would be in Acosta's interest to give implicit support to these rumours without actually confirming them. 

Still, I guess my point doesn't explain why he explicitly claimed the Defense Department thing in the transition interviews. 

Comment by Lukas_Gloor on Sharing Information About Nonlinear · 2023-09-10T23:39:48.960Z · LW · GW

Yeah I agree with that perspective, but want to flag that I thought your original choice of words was unfortunate. It's very much a cost to be wrong when you voice strong criticism of someone's character or call their reputation into question in other ways (even if you flag uncertainty) – just that it's sometimes (often?) worse to do nothing when you're right. 

There's some room to discuss exact percentages. IMO, placing a 25% probability on someone (or some group) being a malefactor* is more than enough to start digging/gossip selectively with the intent of gathering more evidence, but not always enough to go public? Sure, it's usually the case that "malefactors" cause harm to lots of people around them or otherwise distort epistemics and derail things, so there's a sense in which 25% probability might seem like it's enough from a utilitarian perspective of justice.** At the same time, in practice, I'd guess it's almost always quite easy (if you're correct!) to go from 25% to >50% with some proactive, diligent gathering of evidence (which IMO you've done very well), so, in practice, it seems good to have a norm that requires something more like >50% confidence.

Of course, the people who write as though they want you to have >95% confidence before making serious accusations, they probably haven't thought this through very well, because it seems to provide terrible incentives and lets bad actors get away with things way too easily.

*It seems worth flagging that people can be malefactors in some social contexts but not others. For instance, someone could be a bad influence on their environment when they're gullibly backing up a charismatic narcissistic leader, but not when they're in a different social group or out on their own.

**In practice, I suspect that a norm where everyone airs serious accusations with only 25% confidence (and no further "hurdles to clear") would be worse than what we have currently, even on a utilitarian perspective of justice. I'd expect something like an autoimmune overreaction from the time sink issues of social drama and paranoia where people become too protective or insecure about their reputation (worsened by bad actors or malefactors using accusations as one of their weapons). So, the autoimmune reaction could become overall worse than what one is trying to protect the community from, if one is too trigger-happy.

Comment by Lukas_Gloor on Meta Questions about Metaphilosophy · 2023-09-01T11:08:51.520Z · LW · GW

I feel like there are two different concerns you've been expressing in your post history:

(1) Human "philosophical vulnerabilities" might get worsened (bad incentive setting, addictive technology) or exploited in the AI transition. In theory and ideally,  AI could also be a solution to this and be used to make humans more philosophically robust.

(2) The importance of "solving metaphilosophy," why doing so would help us with (1).

My view is that (1) is very important and you're correct to highlight it as a focus area we should do more in. For some specific vulnerabilities or failure modes, I wrote a non-exhaustive list here in this post under the headings "Reflection strategies require judgment calls" and "Pitfalls of reflection procedures." Some of it was inspired by your LW comments.

Regarding (2), I think you overestimate how difficult the problem is. My specific guess is you might overestimate its difficulty because you might confuse uncertainty over a problem with objective solutions with indecisiveness about mutually incompatible ways of reasoning. Uncertainty and indecisiveness may feel similar when you're in that mental state, but they imply different solutions to step forward.

I feel like you already know all there is to know about metaphilosophical disagreements or solution attempts. When I read your posts, I don't feel like "oh, I know more than Wei Dai does." But then you seem uncertain between things that I don't feel uncertain about, and I'm not sure what to make of that. I subscribe to the view of philosophy as "answering confused questions." I like the following Wittgenstein's quote:

[...] philosophers do not—or should not—supply a theory, neither do they provide explanations. “Philosophy just puts everything before us, and neither explains nor deduces anything. Since everything lies open to view there is nothing to explain (PI 126).”

As I said elsewhere, per this perspective, I see the aim of [...] philosophy as to accurately and usefully describe our option space – the different questions worth asking and how we can reason about them.

This view also works for metaphilosophical disagreements. 

There's a brand of philosophy (often associated with Oxford) that's incompatible with the Wittgenstein quote because it uses concepts that will always remain obscure, like "objective reasons" or "objective right and wrong," etc. The two ways of doing philosophy seem incompatible because one of them is all about concepts that the other doesn't allow. But if you apply the perspective from the Wittgenstein quote to look at the metaphilosophical disagreement between "Wittgensteinian view" vs. "objective reasons views," well then you're simply choosing between two different games to play. Do you want to go down the path of increased clarity and clear questions, or do you want to go all-in on objective reasons. You gotta pick one or the other. 

For what it's worth, I feel like the prominent alignment researchers in the EA community almost exclusively reason about philosophy in the anti-realist, reductionist style. I'm reminded of Dennett's "AI makes philosophy honest." So, if we let alignment researchers label the training data, I'm optimistic that I'd feel satisfied with the "philosophy" we'd get out of it, conditional on solving alignment in an ambitious and comprehensive way.

Other parts of this post (the one I already linked to above) might be relevant to our disagreement, specifically with regard to the difference between uncertainty and indecisiveness. 

Comment by Lukas_Gloor on My tentative best guess on how EAs and Rationalists sometimes turn crazy · 2023-06-22T23:54:11.269Z · LW · GW

Oh, I was replying to Iceman – mostly this part that I quoted:  

If you have galaxy brained the idea of the St. Petersberg Paradox, it seems like Alameda style fraud is +EV.

(I think I've seen similar takes by other posters in the past.)

I should have mentioned that I'm not replying to you. 

I think I took such a long break from LW that I forgot that you can make subthreads rather than just continue piling on at the end of a thread.

 

Comment by Lukas_Gloor on My tentative best guess on how EAs and Rationalists sometimes turn crazy · 2023-06-22T23:46:47.839Z · LW · GW

If you have galaxy brained the idea of the St. Petersberg Paradox, it seems like Alameda style fraud is +EV.

I don't think so.  At the very least, it seems debatable. Biting the bullet in the St Petersburg paradox doesn't mean taking negative-EV bets. House of cards stuff ~never turns out well in the long run, and the fallout from an implosion also grows as you double down. Everything that's coming to light about FTX indicates it was a total house of cards. Seems really unlikely to me that most of these bets were positive even on fanatically risk-neutral, act utilitarian grounds.

Maybe I'm biased because it's convenient to believe what I believe (that the instrumentally rational action is almost never "do something shady according to common sense morality.") Let's say it's defensible to see things otherwise. Even then, I find it weird that because Sam had these views on St Petersburg stuff, people speak as though this explains everything about FTX epistemics. "That was excellent instrumental rationality we were seeing on display by FTX leadership, granted that they don't care about common sense morality and bite the bullet on St Petersburg." At the very least, we should name and consider the other hypothesis, on which the St Petersburg views were more incidental (though admittedly still "characteristic"). On that other hypothesis, there's a specific type of psychology that makes people think they're invincible, which leads to them taking negative bets on any defensible interpretation of decision-making under uncertainty.