AI alignment researchers may have a comparative advantage in reducing s-risks 2023-02-15T13:01:50.799Z
Moral Anti-Realism: Introduction & Summary 2022-04-02T14:29:01.751Z
Moral Anti-Epistemology 2015-04-24T03:30:27.972Z
Arguments Against Speciesism 2013-07-28T18:24:58.354Z


Comment by Lukas_Gloor on Sharing Information About Nonlinear · 2023-09-12T14:23:50.435Z · LW · GW

Practically, third parties who learn about an accusation will often have significant uncertainty about its accuracy. So, as a third party seeing Ben (or anyone else) make a highly critical post, I guess I could remain agnostic until the truth comes out one way or another, and reward/punish Ben at that point. That's certainly an option. Or, I could try to have some kind of bar of "how reasonable/unreasonable does an accusation need to seem to be defensible, praiseworthy, or out of line?" It's a tough continuum and you'll have communities that are too susceptible to witch hunts but also ones where people tend to play things down/placate over disharmony.

Comment by Lukas_Gloor on A non-magical explanation of Jeffrey Epstein · 2023-09-11T13:04:33.211Z · LW · GW

Perhaps there is some sort of guideline preventing him from speaking about this, but I have not heard of it. District Attorneys and the FBI publicly announce people were informants all of the time, as long as the people they're prosecuting are already prosecuted. They certainly don't swear an oath not to comment on the subject even in the event of the persons' death.

Yeah, this part seems odd to me. As Acosta, wouldn't you want to tell everyone unambiguously as soon as things are over that you had to "follow orders from high up"? Acosta looks like one of the worst people on the planet in this story. If he could use the excuse of following orders, instead of just being bribed or intimidated somehow, then he'd look slightly less bad? So, once rumours about intelligence agency involvement were started, it seems like it would be in Acosta's interest to give implicit support to these rumours without actually confirming them. 

Still, I guess my point doesn't explain why he explicitly claimed the Defense Department thing in the transition interviews. 

Comment by Lukas_Gloor on Sharing Information About Nonlinear · 2023-09-10T23:39:48.960Z · LW · GW

Yeah I agree with that perspective, but want to flag that I thought your original choice of words was unfortunate. It's very much a cost to be wrong when you voice strong criticism of someone's character or call their reputation into question in other ways (even if you flag uncertainty) – just that it's sometimes (often?) worse to do nothing when you're right. 

There's some room to discuss exact percentages. IMO, placing a 25% probability on someone (or some group) being a malefactor* is more than enough to start digging/gossip selectively with the intent of gathering more evidence, but not always enough to go public? Sure, it's usually the case that "malefactors" cause harm to lots of people around them or otherwise distort epistemics and derail things, so there's a sense in which 25% probability might seem like it's enough from a utilitarian perspective of justice.** At the same time, in practice, I'd guess it's almost always quite easy (if you're correct!) to go from 25% to >50% with some proactive, diligent gathering of evidence (which IMO you've done very well), so, in practice, it seems good to have a norm that requires something more like >50% confidence.

Of course, the people who write as though they want you to have >95% confidence before making serious accusations, they probably haven't thought this through very well, because it seems to provide terrible incentives and lets bad actors get away with things way too easily.

*It seems worth flagging that people can be malefactors in some social contexts but not others. For instance, someone could be a bad influence on their environment when they're gullibly backing up a charismatic narcissistic leader, but not when they're in a different social group or out on their own.

**In practice, I suspect that a norm where everyone airs serious accusations with only 25% confidence (and no further "hurdles to clear") would be worse than what we have currently, even on a utilitarian perspective of justice. I'd expect something like an autoimmune overreaction from the time sink issues of social drama and paranoia where people become too protective or insecure about their reputation (worsened by bad actors or malefactors using accusations as one of their weapons). So, the autoimmune reaction could become overall worse than what one is trying to protect the community from, if one is too trigger-happy.

Comment by Lukas_Gloor on Meta Questions about Metaphilosophy · 2023-09-01T11:08:51.520Z · LW · GW

I feel like there are two different concerns you've been expressing in your post history:

(1) Human "philosophical vulnerabilities" might get worsened (bad incentive setting, addictive technology) or exploited in the AI transition. In theory and ideally,  AI could also be a solution to this and be used to make humans more philosophically robust.

(2) The importance of "solving metaphilosophy," why doing so would help us with (1).

My view is that (1) is very important and you're correct to highlight it as a focus area we should do more in. For some specific vulnerabilities or failure modes, I wrote a non-exhaustive list here in this post under the headings "Reflection strategies require judgment calls" and "Pitfalls of reflection procedures." Some of it was inspired by your LW comments.

Regarding (2), I think you overestimate how difficult the problem is. My specific guess is you might overestimate its difficulty because you might confuse uncertainty over a problem with objective solutions with indecisiveness about mutually incompatible ways of reasoning. Uncertainty and indecisiveness may feel similar when you're in that mental state, but they imply different solutions to step forward.

I feel like you already know all there is to know about metaphilosophical disagreements or solution attempts. When I read your posts, I don't feel like "oh, I know more than Wei Dai does." But then you seem uncertain between things that I don't feel uncertain about, and I'm not sure what to make of that. I subscribe to the view of philosophy as "answering confused questions." I like the following Wittgenstein's quote:

[...] philosophers do not—or should not—supply a theory, neither do they provide explanations. “Philosophy just puts everything before us, and neither explains nor deduces anything. Since everything lies open to view there is nothing to explain (PI 126).”

As I said elsewhere, per this perspective, I see the aim of [...] philosophy as to accurately and usefully describe our option space – the different questions worth asking and how we can reason about them.

This view also works for metaphilosophical disagreements. 

There's a brand of philosophy (often associated with Oxford) that's incompatible with the Wittgenstein quote because it uses concepts that will always remain obscure, like "objective reasons" or "objective right and wrong," etc. The two ways of doing philosophy seem incompatible because one of them is all about concepts that the other doesn't allow. But if you apply the perspective from the Wittgenstein quote to look at the metaphilosophical disagreement between "Wittgensteinian view" vs. "objective reasons views," well then you're simply choosing between two different games to play. Do you want to go down the path of increased clarity and clear questions, or do you want to go all-in on objective reasons. You gotta pick one or the other. 

For what it's worth, I feel like the prominent alignment researchers in the EA community almost exclusively reason about philosophy in the anti-realist, reductionist style. I'm reminded of Dennett's "AI makes philosophy honest." So, if we let alignment researchers label the training data, I'm optimistic that I'd feel satisfied with the "philosophy" we'd get out of it, conditional on solving alignment in an ambitious and comprehensive way.

Other parts of this post (the one I already linked to above) might be relevant to our disagreement, specifically with regard to the difference between uncertainty and indecisiveness. 

Comment by Lukas_Gloor on My tentative best guess on how EAs and Rationalists sometimes turn crazy · 2023-06-22T23:54:11.269Z · LW · GW

Oh, I was replying to Iceman – mostly this part that I quoted:  

If you have galaxy brained the idea of the St. Petersberg Paradox, it seems like Alameda style fraud is +EV.

(I think I've seen similar takes by other posters in the past.)

I should have mentioned that I'm not replying to you. 

I think I took such a long break from LW that I forgot that you can make subthreads rather than just continue piling on at the end of a thread.


Comment by Lukas_Gloor on My tentative best guess on how EAs and Rationalists sometimes turn crazy · 2023-06-22T23:46:47.839Z · LW · GW

If you have galaxy brained the idea of the St. Petersberg Paradox, it seems like Alameda style fraud is +EV.

I don't think so.  At the very least, it seems debatable. Biting the bullet in the St Petersburg paradox doesn't mean taking negative-EV bets. House of cards stuff ~never turns out well in the long run, and the fallout from an implosion also grows as you double down. Everything that's coming to light about FTX indicates it was a total house of cards. Seems really unlikely to me that most of these bets were positive even on fanatically risk-neutral, act utilitarian grounds.

Maybe I'm biased because it's convenient to believe what I believe (that the instrumentally rational action is almost never "do something shady according to common sense morality.") Let's say it's defensible to see things otherwise. Even then, I find it weird that because Sam had these views on St Petersburg stuff, people speak as though this explains everything about FTX epistemics. "That was excellent instrumental rationality we were seeing on display by FTX leadership, granted that they don't care about common sense morality and bite the bullet on St Petersburg." At the very least, we should name and consider the other hypothesis, on which the St Petersburg views were more incidental (though admittedly still "characteristic"). On that other hypothesis, there's a specific type of psychology that makes people think they're invincible, which leads to them taking negative bets on any defensible interpretation of decision-making under uncertainty.

Comment by Lukas_Gloor on My May 2023 priorities for AI x-safety: more empathy, more unification of concerns, and less vilification of OpenAI · 2023-05-24T15:23:25.814Z · LW · GW

My comment above was mostly coming from a feeling of being upset, so I'm writing a second comment here to excavate why I feel strongly about this (and decide whether I stand by it on reflection).

I think the reason I care about this is because I'm concerned that we're losing the ability to distinguish people who are worth learning from ("genuine experts") from people who have a platform + an overconfident personality. With this concern in mind, I don't want to let it slide that someone can lower the standards of discourse to an arbitrary degree without suffering a loss of their reputation. (I would say the same thing about some AI safety advocates.) Of course, I agree it reflects badly on AI safety advocates if they're needlessly making it harder for critics to keep an open mind. Stop doing that. At the same time, it also reflects badly on Meta and the way the media operates ("who qualifies as an expert?") that the chief AI scientist at the company and someone who gets interviewed a lot has some of the worst takes on the topic I've ever seen. That's scary all by itself, regardless of how we got here.

Comment by Lukas_Gloor on My May 2023 priorities for AI x-safety: more empathy, more unification of concerns, and less vilification of OpenAI · 2023-05-24T14:41:06.063Z · LW · GW

I think you're giving LeCun way too much credit if you're saying his arguments are so bad now because other people around him were hostile and engaged in a low-quality way. Maybe those things were true, but that doesn't excuse stuff like repeating bad arguments after they've been pointed out or confidently proclaiming that we have nothing to worry about based on arguments that obviously don't hold up. 

Comment by Lukas_Gloor on Romance, misunderstanding, social stances, and the human LLM · 2023-05-05T10:55:20.812Z · LW · GW

There's a difference between "having a desire for limerence" and "being the person capable of developing limerence." Some people may not have a desire for it, but they get limerent pretty quickly with the right triggers. (Some people may even hate the fact that their brain does this because it keeps getting them into bad situations, but they keep developing limerence and are a slave to it.)

This led me to research it, and it seems like limerence is a highly culture specific, and is likely more a cultural meme than an emotion inherent to human brains.

It's a distinct emotional state comparable to being on a powerful drug. So, it can't just be a cultural meme. Of course, it could be that the frequency with which the emotional state is elicited is culture-dependent. (Just like some culture have a higher/lower prevalence of depression.)What's also culture-dependent is whether you romanticize limerence or whether you look at it as something dysfunctional. As you mention, some people seem to think good romance requires limerence. I think that's irrational (unless you care more about the "hedonics" of being in love than finding someone actually compatible). 

I agree that there's a connection from limerence to drama – though this is for indirect correlational reasons rather than limerence being defined through drama.

If I were to guess, I would say limerence is a side effect of the emotional and sexual frustration of the young and inexperienced humans who dabble in their first relationships,

I suspect the same thing, I think it might have to do with unmet needs and the fantasy of fulfilling them all through this one ideal person you met (who you don't really know yet, but you're projecting onto them everything that can fix your loneliness/pain). (It can be completely non-sexual). What I don't understand is how you go from the description "side effect from frustration" to "it's a cultural meme." Depression is an emotional state that we could describe as being a side effect of unmet needs as well, but this doesn't make it a cultural meme. 

I think there also might be a lot genetic variation to people's propensity to develop limerence?

Comment by Lukas_Gloor on Why aren’t more of us working to prevent AI hell? · 2023-05-04T19:05:49.093Z · LW · GW

This could be true as a reason why some people de-prioritize s-risks, but I don't think it's a correct statement.  See the section "s-risk reduction is separate from alignment work" here.  

Comment by Lukas_Gloor on Why aren’t more of us working to prevent AI hell? · 2023-05-04T18:37:16.410Z · LW · GW

Related to the "personal fit" explanation: I'd argue that the skills required to best reduce s-risks have much overlap with the skills to make alignment progress (see here).  

At least, I think this goes for directly AI-related s-risks, which I consider most concerning, but I put significantly lower probabilities on them than you do.

For s-risks conditioned on humans staying in control over the future, we maybe wouldn't gain much from explicitly modelling AI takeoff and engaging in all the typical longtermist thought. Therefore, some things that reduce future disvalue don't have to look like longtermism? For instance, common sense ways to improve society's rationality, coordination abilities, and values. (Maybe there's a bit of leverage to gain from thinking explicitly about how AI will change things.) The main drawback to those types of interventions is (1) disvalue at stake might be smaller than the disvalue for directly AI-related s-risks conditional on the scenarios playing out, and (2) it only matters how society thinks and what we value if humans actually stay in control over the future, which arguably seems pretty unlikely.

Comment by Lukas_Gloor on The 0.2 OOMs/year target · 2023-04-17T20:54:51.231Z · LW · GW

Otherwise you have a constant large compute overhang.

I think we should strongly consider finding a way of dealing with that rather than only looking at solutions that produce no overhang. For all we know, total compute required for TAI (especially factoring in future algorithmic progress) isn't far away from where we are now. Dealing with the problem of preventing defectors from exploiting a compute overhang seems potentially easier than solving alignment on a very short timescale. 

Comment by Lukas_Gloor on Moderation notes re: recent Said/Duncan threads · 2023-04-15T23:54:05.359Z · LW · GW

Said's way of asking questions, and the uncharitable assumptions he sometimes makes, is one of the most off-putting things I associate with LW. I don't find it okay myself, but it seems like the sort of thing that's hard to pin down with legible rules. Like, if he were to ask me "what is it that you don't like, exactly" – I feel like it's hard to pin down.

Edit: So, on the topic of moderation policy, seems like the option that individual users can ban specific other users if they have trouble dealing with their style or just if conflicts happen, that seems like a good solution to me. And I don't think it should reflect poorly on the banner (unless they ban an extraordinary number of other users). 

Comment by Lukas_Gloor on Evolution provides no evidence for the sharp left turn · 2023-04-11T21:21:29.381Z · LW · GW

I like the reasoning behind this post, but I'm not sure I buy the conclusion. Here's an attempt at excavating why not:

If I may try to paraphrase, I'd say your argument has two parts:

(1) Humans had a "sharp left turn" not because of some underlying jump in brain capabilities, but because of shifting from one way of gaining capabilities to another (from solo learning to culture).

(2) Contemporary AI training is more analogous to "already having culture," so we shouldn't expect that things will accelerate in ways ML researchers don't already anticipate based on trend extrapolations.

Accordingly, we shouldn't expect AIs to get a sharp left turn.

I think I buy (1) but I'm not sure about (2). 

Here's an attempt at arguing that AI training will still get a "boost from culture." If I'm right, it could even be the case that their "boost from culture" will be larger than it was for early humans because we now have a massive culture overhang.

Or maybe "culture" isn't the right thing exactly, and the better phrase is something like "generality-and-stacking-insights-on-top-of-each-other threshold from deep causal understanding." If we look at human history, it's not just the start of cultural evolution that stands out – it's also the scientific revolution! (A lot of cultural evolution worked despite individual humans not understanding why they do the things that they do [Henrich's "The Secret of our Success] – by contrast, science is different and requires at least some scientists to understand deeply what they're doing.)

My intuition is that there's an "intelligence" threshold past which all the information on the internet suddenly becomes a lot more useful.  When Nate/MIRI speak of a "sharp left turn," my guess is that they mean some understanding-driven thing. (And it has less to do with humans following unnecessarily convoluted rules about food preparation that they don't even understand the purpose of, but following the rules somehow prevents them from poisoning themselves.) It's not "culture" per se, but we needed culture to get there (and maybe it matters "what kind of culture" – e.g., education with scientific mindware).

Elsewhere, I expressed it as follows (quoting now from text I wrote elsewhere):

I suspect that there’s a phase transition that happens when agents get sufficiently good at what Daniel Kokotajlo and Ramana Kumar call “P₂B” (a recursive acronym for “Plan to P₂B Better”). When it comes to “intelligence,” it seems to me that we can distinguish between “learning potential” and “trained/crystallized intelligence” (or “competence”). Children who grow up in an enculturated/learning-friendly setting (as opposed to, e.g., feral children or Helen Keller before she met her teacher) reach a threshold where their understanding of the world and their thoughts becomes sufficiently deep to kickstart a feedback loop. Instead of aimlessly absorbing what’s around them, they prioritize learning the skills and habits of thinking that seem beneficial according to their goals. In this process, slight differences in “learning potential” can significantly affect where a person ends up in their intellectual prime. So, “learning potential” may be gradual, but above a specific threshold (humans above, chimpanzees below), there’s a discontinuity in how it translates to “trained/crystallized intelligence” after a lifetime of (self-)directed learning. Moreover, it seems that we can tell that the slope of the graph (y-axis: “trained/crystallized intelligence;” x-axis: “learning potential”) around the human range is steep.

To quote something I’ve written previously:

“If the child in the chair next to me in fifth grade was slightly more intellectually curious, somewhat more productive, and marginally better dispositioned to adopt a truth-seeking approach and self-image than I am, this could initially mean they score 100%, and I score 95% on fifth-grade tests – no big difference. But as time goes on, their productivity gets them to read more books, their intellectual curiosity and good judgment get them to read more unusually useful books, and their cleverness gets them to integrate all this knowledge in better and increasingly more creative ways. [...] By the time we graduate university, my intellectual skills are mostly useless, while they have technical expertise in several topics, can match or even exceed my thinking even on areas I specialized in, and get hired by some leading AI company.


If my 12-year-old self had been brain-uploaded to a suitable virtual reality, made copies of, and given the task of devouring the entire internet in 1,000 years of subjective time (with no aging) to acquire enough knowledge and skill to produce novel and for-the-world useful intellectual contributions, the result probably wouldn’t be much of a success. If we imagined the same with my 19-year-old self, there’s a high chance the result wouldn’t be useful either – but also some chance it would be extremely useful. [...]  I think it’s at least plausible that there’s a jump once the copies reach a level of intellectual maturity to make plans which are flexible enough [...] and divide labor sensibly [...].”

In other words, I suspect there’s a discontinuity at the point where the P₂B feedback loop hits its critical threshold.

So, my intuition here is that we'll see phase change once AIs reach the kind of deeper understanding of things that allows them to form better learning strategies. That phase transition will be similar in kind to going from no culture to culture, but it's more "AIs suddenly grokking rationality/science to a sufficient-enough degree that they can stack insights with enough reliability to avoid deteriorating results." (Once they grok it, the update permeates to everything they've read – since they read large parts of the internet, the result will be massive.)

I'm not sure what all this implies about values generalizing to new contexts / matters of alignment difficulty. You seem open to the idea of fast takeoff through AIs improving training data, which seems related to my notion of "AIs get smart enough to notice on their own what type of internet-text training data is highest quality vs what's dumb or subtly off." So, maybe we don't disagree much and your objection to the "sharp left turn" concept has to do with the connotations it has for alignment difficulties.

Comment by Lukas_Gloor on Updating my AI timelines · 2023-03-29T16:36:42.908Z · LW · GW

"Effective compute" is the combination of hardware growth and algorithmic progress? If those are multiplicative rather than additive, slowing one of the factors may only accomplish little on its own, but maybe it could pave the way for more significant changes when you slow both at the same time? 

Unfortunately, it seems hard to significantly slow algorithmic progress. I can think of changes to publishing behaviors (and improving security) and pausing research on scary models (for instance via safety evals). Maybe things like handicapping talent pools via changes to immigration policy, or encouraging capability researchers to do other work. But that's about it. 

Still, combining different measures could be promising if the effects are multiplicative rather than additive. 

Edit: Ah, but I guess your point is that even a 100% tax on compute wouldn't really change the slope of the compute growth curve – it would only move the curve rightward and delay a little. So we don't get a multiplicative effect, unfortunately. We'd need to find an intervention that changes the steepness of the curve.   

Comment by Lukas_Gloor on Tabooing "Frame Control" · 2023-03-20T16:12:00.875Z · LW · GW

At first maybe you try to argue with them about it. But over time, a) you find yourself not bothering to argue with them

>Whose fault is that, exactly…?

b) even when you do argue with them, they’re the ones choosing the terms of the argument.


If they think X is important, you find yourself focused on argue whether-or-not X is true, and ignoring all the different Ys and Zs that maybe you should have been thinking about.



I agree that nothing about the examples you quote is unacceptably bad – all these things are "socially permissible." 

At the same time, your "Whose fault is that, exactly...?" makes it seem like there's nothing the guru in question could be doing differently. That's false.

Sure, some people are okay with seeing all social interactions as something where everyone is in it for themselves. However, in close(r) relationship contexts (e.g. friendships, romantic relationships, probably also spiritual mentoring from a guru?), many operate on the assumption that people care about each other and want to preserve each other's agency and help each other flourish. In that context, it's perfectly okay to have an expectation that others will (1) help me notice and speak up if something doesn't quite feel right to me (as opposed to keeping quiet) and (2) help me arrive at informed/balanced views after carefully considering alternatives, as opposed to only presenting me their terms of the argument.

If the guru never says "I care about you as a person," he's fine to operate as he does. But once he starts to reassure his followers that he always has their best interest in mind – that's when he crosses the line into immoral, exploitative behavior. 

You can't have it both ways. If your answer to people getting hurt is always "well, whose fault was that?" 

Then don't ever fucking reassure them that you care about them!

In reality, I'm pretty sure "gurus" almost always go to great lengths convincing their followers that they care more about them than almost anyone else. That's where things become indefensible.

Comment by Lukas_Gloor on Tabooing "Frame Control" · 2023-03-20T15:47:08.321Z · LW · GW

Compare the two: 

(1) The difference between "bad frame control" and "good frame control" is that, in the latter, the frame matches physical reality and social reality.

Here, I use "social reality" in the sense of "insights about what types of actions or norms help people flourish."

(2) The difference between lying and telling the truth is that, when someone doesn't lie, what they say matches physical reality and social reality.

I feel like there's a sense in which (1) is true, but it's missing the point if someone thinks that this is the only difference. If you lie a lot around some subject matter, or if you manipulate someone with what aella calls frame control, there's always an amount of friction around the subject matter or around the frame that you introduce. This friction wouldn't be there if you're going with the truth. The original post points out how frame controllers try to hide that sort of friction or bring down your defenses against it. Those subtleties are what's bad about the bad kind of frame control. Noticing these subtleties is what it's all about.

Someone might object as follows: 

"Friction" can mean many things. If you try to push people to accomplish extraordinary feats with your weird-seeming startup, you have to motivate them and push against various types of "friction" – motivate your co-workers, make them okay with being seen as weird as long as your idea hasn't succeeded, etc.

I agree with all that. Good leaders have to craft motivating frames and inspire others with their vision. But I still feel like that's not the same thing as what happens in (the bad kind of) frame control. The word "control" is a clue about where the difference lies. It's hard to pin down the exact difference. Maybe it's something like this: 

Good leadership is about offering frames to your followers that create win-win situations (for them and for the world!) by appealing to virtues that they already endorse, deliberately drawing attention to all the places where there's friction from social conventions or inertia/laziness, but presenting a convincing vision about why it's worth it to push against that friction. 

By contrast, frame control (the bad, sneaky/coercive kind) is about guilting people into thinking it's their fault if they struggle because of the friction, or trying to not have them notice that there are alternative frames for them. 

Comment by Lukas_Gloor on Frame Control · 2023-03-20T11:53:38.467Z · LW · GW

Hm, maybe. I can see that frame control comes in handy when you're a general in a war, or a CEO of a startup (and probably at least some generals or CEOs are good people with good effects on the world). However, in wartime, it feels like a necessary evil to have to convince your soldiers to march to the their death. And in startups – I don't know, cultishness can have its advantages, but I feel like the best leadership is NOT turning your underlings into people who look cultish to outsiders. So, I think the good version of frame control is generally weaker than the bad version, for instance because good leaders don't have anything to fear in terms of their followers becoming better at passing Ideological Turing tests for opposing views. But I guess that's just expressing your point in different words: we can say that, if our frame is aligned with physical reality and avoids negative social outcomes, it shouldn't look like the people who buy into it are cultists.

I also think it's informative to think about the context of a romantic relationship. In that context, I'm not sure there's a version of "good frame control" that's necessary. Except maybe for frames like "good communication is important" – if one person so far struggled to express their needs because they weren't taken seriously in their past life, it can be good for both individuals if the more securely attached person pushes that kind of frame. However, the way you would do that isn't by repeating "good communication is important" as a mantra or weapon to shame the other person for not communicating the way you want! Instead, you try showing them the benefits of good communication, convincing them through evidence of how nice it feels when it works. That's very different from the bad type of frame control in relationships. Also, let's say you have two people who already understand that good communication is important. Then no one is exerting any frame control – you simply have two happy people who live in the same healthy frame. And insofar as they craft features of their personal "relationship frame," it's a mutual sort of thing, so no one is exactly exerting any sort of control.

These examples, and the fact that you can have relationships (not just romantic ones) where something feels mutual rather than "control exerted by one party," makes me think that there's more to it than "good frame control differs from bad frame control merely in terms of correspondence to physical reality (and social reality)." I guess it depends what we mean by "social reality." I think bad frame control is primarily about a lack of empathy, and that happens to leave a very distinct pattern, which you simply can't compare to "good leadership."

Edit: I saw another commenter making a good point in reply to your comment. What you call "good frame control" is done out in the open. The merits of good frames are often self-evident or at least verifiable. By contrast, the OP discusses (bad) frame control as a type of sneak attack. It tries to overcome your epistemic defenses.

Comment by Lukas_Gloor on Wittgenstein's Language Games and the Critique of the Natural Abstraction Hypothesis · 2023-03-16T11:07:17.364Z · LW · GW

Yeah, what I meant was the belief that there's no incorrect way to set up a language game.

Comment by Lukas_Gloor on Wittgenstein's Language Games and the Critique of the Natural Abstraction Hypothesis · 2023-03-16T09:52:46.356Z · LW · GW

and some audiences have measurably better calibration.

It's not straightforward in all contexts to establish what counts as good calibration. It's straightforward for empirical forecasting, but if we were to come up with a notion like "good calibration for ethical judgments," we'd have to make some pretty subjective judgment calls. Similarly, something like "good calibration for coming up with helpful abstractions for language games" (which we might call "doing philosophy" or a subskill of it) also seems (at least somewhat) subjective. 

That doesn't mean "anything goes," but I don't yet see how your point about dialogue trees applies to "maybe a society of AIs would build abstractions we don't yet understand, so there'd be a translation problem between their language games and ours." 

Comment by Lukas_Gloor on Wittgenstein's Language Games and the Critique of the Natural Abstraction Hypothesis · 2023-03-16T09:39:47.982Z · LW · GW

There are correct and incorrect ways to play language games.

That's the crux. Wittgenstein himself believed otherwise and spent the most part of the book arguing against it. I think he makes good points.

At one point, he argues that there's no single correct interpretation for "What comes next in the sequence: '2, 4, 6, 8, 10, 12, ...?'" 

Maybe this goes a bit too far. :) I think he's right in some nitpicky sense, but for practical purposes, sane people will say "14" every time and that works well for us.

We can see this as version of realism vs anti-realism debates: realism vs anti-realism about natural abstractions.  As I argue in the linked post, anti-realism is probably the right way of looking at most or even all of these, but that doesn't mean "anything goes." Sometimes there's ambiguity about our interpretations of things, but reality does have structure, and "ambiguity" isn't the same as "you can just make random stuff up and expect it to be useful."

Comment by Lukas_Gloor on Shutting Down the Lightcone Offices · 2023-03-15T01:04:38.662Z · LW · GW

Thanks for sharing your reasoning, that was very interesting to read! I kind of agree with the worldview outlined in the quoted messages from the "Closing-Office-Reasoning" channel. Something like "unless you go to extreme lengths to cultivate integrity and your ability to reason in truth-tracking ways, you'll become a part of the incentive-gradient landscape around you, which kills all your impact."

Seems like a tough decision to have to decide whether an ecosystem has failed vs. whether it's still better than starting from scratch despite its flaws. (I could imagine that there's an instinct to just not think about it.)

Sometimes we also just get unlucky, though. (I don't think FTX was just bad luck, but e.g., with some of the ways AI stuff played out, I find it hard to tell. Of course, just because I find it hard to tell doesn't mean it's objectively hard to tell. Maybe some things really were stupid also when they happened, not just in hindsight.)

I'm curious if you think there are "good EA orgs" where you think the leadership satisfies the threshold needed to predictably be a force of good in the world (my view is yes!). If yes, do you think that this isn't necessarily enough for "building the EA movement" to be net positive? E.g., maybe you think it boosts the not-so-good orgs just as much as the good ones, and "burns the brand" in the process? 

I'd say that, if there are some "good EA orgs," that's a reason for optimism. We can emulate what's good about them and their culture. (It could still make sense to be against further growth if you believe the ratio has become too skewed.) Whereas, if there aren't any, then we're already in trouble, so there's a bit of a wager against it.

Comment by Lukas_Gloor on Success without dignity: a nearcasting story of avoiding catastrophe by luck · 2023-03-14T21:56:15.357Z · LW · GW

I think “Luck could be enough” should be the strong default on priors,2 so in some sense I don’t think I owe tons of argumentation here (I think the burden is on the other side).

I agree with this being the default and the burden being on the other side. At the same time, I don't think of it as a strong default.

Here's a frame that I have that already gets me to a more pessimistic (updated) prior:

It has almost never happened that people who developed and introduced a revolutionary new technology displayed a lot of foresight about its long-term consequences. For instance, there were comparatively few efforts at major social media companies to address ways in which social media might change society for the worse. The same goes for the food industry and the obesity epidemic or online dating and its effects on single parenthood rates. When people invent cool new technology, it makes the world better on some metrics but creates new problems on its own. The whole thing is accelerating and feels out of control.

It feels out of control because even if we get cool new things from tech progress, we don't seem to be getting any better at fixing the messiness that comes with it (misaligned incentives/goodhearting, other Molochian forces, world-destroying tech becoming ever more accessible). Your post says "a [] story of avoiding catastrophe by luck." This framing makes it sound like things would be fine by default if it isn't for some catastrophe happening. However, humans have never seemed particularly "in control" over technological progress. For things to go well, we need the opposite of a catastrophe – a radical change towards the upside. We have to solve massive coordination problems and hope for a technology that gives us god-like power, finally putting sane and compassionate forces in control over the future. It so happens that we can tell a coherent story about how AI might do this for us. But to say that it might go right just by luck – I don't know, that seems far-fetched!

All of that said, I don't think we can get very far arguing from priors. What carries by far the most weight are arguments about alignment difficulty, takeoff speeds, etc. And I think it's a reasonable view to say that it's very unlikely that any researchers currently know enough to make highly confident statements about these variables. (Edit: So, I'm not sure we disagree too much – I think I'm more pessimistic about the future than you are, but I'm probably not as pessimistic as the position you're arguing against in this post. I mostly wanted to make the point that I think the "right" priors support at least moderate pessimism, which is a perspective I find oddly rare among EAs.) 

FWIW, it's not obvious to me that slow takeoff is best. Fast takeoff at least gives you god-like abilities early on, which are useful from a perspective of "we were never particularly in control over history; lots of underlying problems need fixing before we pass a point of no return." By contrast, with slow takeoff, coordination problems seem more difficult because (at least by default) there will be more actors using AIs in some ways or other and it's not obvious that the AIs in a slow-takeoff scenario will be all that helpful at facilitating coordination. 

Comment by Lukas_Gloor on Taboo "compute overhang" · 2023-03-01T20:49:24.881Z · LW · GW

I don't understand this sentence:

Oh, yeah, I butchered that entire description. 

It's the gap between the training compute of 'the first AGI' and what?

What I had in mind was something like the gap between how much "intelligence" humans get from the compute they first build AGI with vs. how much "intelligence" AGI will get out of the same compute available, once it optimizes software progress for a few iterations.

So, the "gap" is a gap of intelligence rather than compute, but it's "intelligence per specified quantity of compute." (And that specified quantity is how much compute we used to build AGI in the first place.)

Comment by Lukas_Gloor on Taboo "compute overhang" · 2023-03-01T20:41:47.862Z · LW · GW

And I asked some friends what "hardware overhang" means and they had different responses (a plurality said it means sufficient hardware for human-level AI already exists, which is not a useful concept).

It's not a useful concept if we can't talk about the probability of "finding" particularly efficient AGI architectures through new insights. However, it seems intelligible and strategically important to talk about something like "the possibility that we're one/a few easy-to-find insight(s) away from suddenly being able to build AGI with a much smaller compute budget than the largest training runs to date." That's a contender for the concept of "compute overhang." (See also my other comment.) 

Maybe what you don't like about this definition is that it's inherently fuzzy: even if we knew everything about all possible AGI architectures, we'd still have uncertainty about how long it'll take AI researchers to come up with the respective insights. I agree that this makes the concept harder to reason about (and arguably less helpful).

Comment by Lukas_Gloor on Taboo "compute overhang" · 2023-03-01T20:18:38.263Z · LW · GW

I agree that it seems best for people to define the concept whenever they use it.

Instead of tabooing it, we could declare a canonical definition. I think the best candidate is something like: there is a compute overhang to the extent that the largest training runs could quickly be scaled up.

This proposal has little to do with hard vs. soft takeoff, which (IIRC) was the context in which Bostrom used "hardware overhang" in Superintelligence.

One thing that made the discussion confusing is that Bostrom originally discussed hard vs. soft takeoff as having relevance only after we build AGI, whereas Paul Christiano's view on soft takeoff introduced the idea that "takeoff" already starts before AGI.

This made me think that it could be useful to distinguish between "post-AGI" and "pre-AGI" compute overhangs. It could go as follows:

Pre-AGI compute overhang:

There's a pre-AGI compute overhang to the degree that the following could happen: we invent an algorithm that will get us to AGI before we scale up training runs to the biggest attainable sizes (on some short timescale). 

So, on this definition, there are two ways in which we might already be in a pre-AGI compute overhang:

(1) Timelines are very short and we could get AGI with "current algorithms" (not necessarily GPT_n with zero tweaks, but obvious things to try that require no special insight) with less scaling effort than a Manhattan project.

(2) We couldn't get AGI with current algorithms via any less-than-maximal scaling effort (and maybe not even with a maximal one – that part isn't relevant for the claim), but there are highly significant algorithmic insights waiting for us (that we have a realistic chance of discovering). Once we incorporate these insights, we'll be in the same situation as described in (1).

Post-AGI compute overhang:

Once we build AGI with realistic means, using far-from-optimal algorithms, how much room is there for it to improve its algorithms during "takeoff"/intelligence explosion? "Post-AGI compute overhang" here describes the gap between compute used to build AGI in the first place vs. more efficient designs that AI-aided progress could quickly discover.

[Edit: Correction: "Post-AGI compute overhang" here describes the gap in "intelligence" of the first AGI vs. the "intelligence" of a more efficient design (using the same amount of training compute as that first AGI) that AI-aided progress could quickly discover.]

On that definition, it's actually quite straightforward that shorter timelines imply a smaller compute overhang (so maybe that's what Sam Altman meant here).

Comment by Lukas_Gloor on Enemies vs Malefactors · 2023-03-01T17:52:59.925Z · LW · GW

My stance is "the more we promote awareness of the psychological landscape around destructive patterns of behavior, the better." This isn't necessarily at odds with what you're saying because "the psychological landscape" is a descriptive thing, whereas your objection to Nate's proposal is that it seeks to be "immediately-decision-relevant," i.e., that it's normative (or comes with direct normative implications). 

So, maybe I'd agree that "maleficient" might be slightly too simplistic of a classification (because we may want to draw action-relevant boundaries in different places depending on the context – e.g., different situations call for different degrees of risk tolerance of false positives vs. false negatives). 

That said, I think there's an important message in Nate's post and (if I had to choose one or the other) I'm more concerned about people not internalizing that message than about it potentially feeding ammunition to witch hunts. (After all, someone who internalizes Nate's message will probably become more concerned about the possibility of witch hunts – if only explicitly-badly-intentioned people instigated witch hunts or added fuel to the fires, history would look very different.)

Comment by Lukas_Gloor on Enemies vs Malefactors · 2023-03-01T16:48:54.809Z · LW · GW

If there's a legitimate belief that they're trying to do better and act in good faith, then what can be done to actually empower them to change in a positive direction? That's of course if they actually want to change, if they're keeping themselves in a state that causes harm because it benefits them while insisting its fine, well, to steal a sith's turn of phrase: airlocked

I agree that it's important to give people constructive feedback to help them change. However, I see some caveats around this (I think I'm expanding on the points in your comment rather than disagreeing with it). Sometimes it's easier said than done. If part of a person's "destructive pattern" is that they react with utter contempt when you give them well-meant and (reasonably-)well-presented feedback, it's understandable if you don't want to put yourself in the crossfire. In that case, you can always try to avoid contact with someone. Then, if others ask you why you're doing this, you can say something that conveys your honest impressions while making clear that you haven't given this other person much of a chance.

Just like it's important to help people change, I think it's also important to seriously consider the hypothesis that some people are so stuck in their destructive patterns that giving constructive feedback is no longer justifiable in terms of social opportunity costs. (E.g., why invest 100s of hours helping someone become slightly less destructive if you can promote social harmony 50x better by putting your energy into pretty much anyone else.) 

Someone might object as follows. "If someone is 'well-intentioned,' isn't there a series of words you* can kindly say to them so that they'll gain insight into their situation and they'll be able to change?" 

I think the answer here is "no" and I think that's one of the saddest things about life. Even if the answer was, "yes, BUT, ...", I think that wouldn't change too much and would still be sad.

*(Edit) Instead of "you can kindly say to them," the objection seems stronger if this said "someone can kindly say to them." Therapists are well-positioned to help people because they start with a clean history. Accepting feedback from someone you have a messy history with (or feel competitive with, or all kinds of other complications) is going to be much more difficult than the ideal scenario.

One data point that seems relevant here is success probabilities for evidence-based treatments of personality disorders. I don't think personality disorders capture everything about "destructive patterns" (for instance, one obvious thing that they miss is "person behaves destructively due to an addiction"), nor do I think that personality disorders perfectly carve reality at its joints (most traits seem to come on a spectrum!). Still, it seems informative that the treatment success for narcissistic personality disorder seems comparatively very low (but not zero!) for people who are diagnosed with it, in addition to it being vastly under-diagnosed since people with pathological narcissism are less likely to seek therapy voluntarily. (Note that this isn't the case for all personality disorders – e.g., I think I read that BPD without narcissism as a comorbidity has something like 80% chance of improvement with evidence-based therapy.) These stats are some indication that there are differences in people's brain wiring or conditioned patterns that are deep enough that they can't easily be changed with lots of well-intentioned and well-informed communication (e.g., trying to change beliefs about oneself and others). 

So, I think it's a trap to assume that being 'well-intentioned' means that a person is always likely to improve with feedback. Even if, from the outside, it looks as though someone would change if only they could let go of a particular mindset or set of beliefs that seems to be the cause behind their "destructive patterns," consider the possibility that this is more of a symptom rather than the cause (and that the underlying cause is really hard to address). 

Comment by Lukas_Gloor on Sam Altman: "Planning for AGI and beyond" · 2023-03-01T13:02:36.006Z · LW · GW

I guess another way to use the concept is the following: 

Once we build AGI with realistic means, using far-from-optimal algorithms, how much room is there for it to improve its algorithms during "takeoff"/intelligence explosion? "Compute overhang" here describes the gap between compute used to build AGI in the first place vs. more efficient designs that AI-aided progress could quickly discover.

On that definition, it's actually quite straightforward that shorter timelines imply less compute overhang. 

Also, this definition arguably matches the context from Bostrom's Superintelligence more closely, where I first came across the concept of a "hardware overhang." Bostrom introduced the concept when he was discussing hard takeoff vs. soft takeoff.

(To complicate matters, there's been a shift in takeoff speeds discussions where many people are now talking about pre-TAI/pre-AGI speeds of progress, whereas Bostrom was originally focusing on claims about post-AGI speeds of progress.)

Comment by Lukas_Gloor on Sam Altman: "Planning for AGI and beyond" · 2023-03-01T12:44:34.161Z · LW · GW

It was brought to my attention that not everyone might use the concept of a "compute overhang" the same way.

In my terminology, there's a (probabilistic) compute overhang to the degree that the following could happen: we invent an algorithm that will get us to TAI before we even max out compute scaling as much as we currently could.

So, on my definition, there are two ways in which we might already be in a compute overhang:

(1) Timelines are very short and we could get TAI with "current algorithms" (not necessarily GPT_n with zero tweaks, but obvious things to try that require no special insight) with less scaling effort than a Manhattan project.

(2) We couldn't get TAI with current algorithms via any less-than-maximal scaling effort (and maybe not even with a maximal one – that part isn't relevant for the claim), but there are highly significant algorithmic insights waiting for us (that we have a realistic chance of discovering). Once we incorporate these insights, we'll be in the same situation as described in (1).

I would've guessed that Sam Altman was using it the same way, but now I'm not sure anymore. 

Comment by Lukas_Gloor on Sam Altman: "Planning for AGI and beyond" · 2023-02-25T10:35:15.618Z · LW · GW

Okay, I'm also not sure if I agree with the conclusion, but the argument makes sense that way. I just feel like it's a confusing use of terminology.

I think it would be clearer to phrase it slightly differently to distinguish "(a) we keep working on TAI and it takes ~10 years to build" from "(b) we stop research for 10 years and then build AGI almost immediately, which also takes ~10 years." Both of those are "10 year timelines," but (a) makes a claim about the dangers of not pushing forward as much as possible and (a) has higher "2020 training compute requirements" (the notion from Ajeya's framework to estimate timelines given the assumption of continued research) than (b) because it involves more algorithmic progress.

Comment by Lukas_Gloor on Sam Altman: "Planning for AGI and beyond" · 2023-02-24T22:04:08.699Z · LW · GW

Many of us think the safest quadrant in this two-by-two matrix is short timelines and slow takeoff speeds; shorter timelines seem more amenable to coordination and more likely to lead to a slower takeoff due to less of a compute overhang, and a slower takeoff gives us more time to figure out empirically how to solve the safety problem and how to adapt.

I don't understand the part of "less of a compute overhang." There's still room to ramp up compute use in the next few years, so if timelines are very short, that means transformative AI happens when we're not yet pushing the limits of compute. That seems to me where the compute overhang is uncontroversially quite large?

Conceivably, there could also be a large compute overhang in other scenarios (where actors are pushing the competitive limits of compute use). However, wouldn't that depend on the nature of algorithmic progress? If you think present-day algorithms combined with "small" improvements can't get us to transformative AI but some a single (or small number of) game-changing algorithmic insight(s) will get us there, then I agree that "the longer it takes us to find out the algorithmic insight(s), the bigger the compute overhang." Is that the view here?

If so, that would be good to know because I thought many people were somewhat confident that algorithmic progress is unlikely to be "jumpy" in that way? (Admittedly, that never seemed like a rock-solid assumption to me.) If not, does anyone know how this statement about short timelines implying less of a compute overhang was meant? 

Comment by Lukas_Gloor on Consent Isn't Always Enough · 2023-02-24T16:00:30.724Z · LW · GW

And let's further imagine that [...] both have a sophisticated understanding of power dynamics, great communication, solid introspection, strong self-confidence, and the best of intentions.

Somewhat of a separate point: People may tend to overestimate these factors. A friend brought up the argument that it seems to them that people sleeping with each other often makes things awkward or creates a mess of some sort. If true, that point by itself, independently of any further reasoning about whether anyone is morally at fault and independently of questions about bad incentives around power dynamics, is an (impact-driven) argument to not do it in your professional community.

Comment by Lukas_Gloor on The shallow reality of 'deep learning theory' · 2023-02-22T14:02:36.085Z · LW · GW

What if the arguments are more philosophy than math? In that case, I'd say there's still some incentive to talk to the experts who are most familiar with the math, but a bit less so? 

Comment by Lukas_Gloor on Bankless Podcast: 159 - We’re All Gonna Die with Eliezer Yudkowsky · 2023-02-21T22:19:20.328Z · LW · GW

I liked the sincerity of the podcast hosts and how they adjusted their plan for the podcast by not going into "importance of AI for crypto" questions after realizing that Eliezer's view is basically "we all seem doomed, so obviously stuff like crypto is a bit pointless to talk about." 

Comment by Lukas_Gloor on Fucking Goddamn Basics of Rationalist Discourse · 2023-02-06T10:17:53.274Z · LW · GW

I used fashion as an example to argue that just because something has a reliable referent in the minds of a population at a given time, doesn't mean it's a property that isn't largely content-free and determined in a fairly arbitrary and fickle way.

I think I understood that part.

No Lukas, that's false. For instance, I sometimes let people know that I went to music school for 7 years and have lots of music school friends, which comes along with the (true) implication that I'm part of an in-group of musicians — an in-group that I've dedicated a chunk of my life to — and normally the reaction either one of disinterest, or one of interest and enthusiasm, but not cringe.

You're right that this example doesn't seem cringy. But if you shared a meme that said "seven ways you can tell someone went to music school" – that would be cringy.

So, my hypothesis is that cringiness is largely about signalling in-group membership in an "on the nose" way that only appeals to that in-group.

By contrast, saying "I went to music school for 7 years" is something you can make conversation with to someone who didn't go to music school.

This pattern may not capture all instances of cringiness, but I think it captures quite a lot of it. And, like with "caring about fashion," "caring about belonging of the in-group and bonding with other in-group members through cringiness" is an identifiable meta trait that people can pursue reliably even when the underlying signals keep changing.

Comment by Lukas_Gloor on Fucking Goddamn Basics of Rationalist Discourse · 2023-02-06T01:28:46.144Z · LW · GW

I think "cringy" isn't analogous to "fashionable." Instead, I would say "cringy" is analogous to "acting so as to care about what's fashionable."

Yes, it might change what action specifically is cringy. But it's always cringy to do something that non-subtly signals how much you're a part of an in-group.

Used that way, it's not mind-killing at all to make people aware that they're signalling in-groupiness in a non-subtle way and therefore predictably turning off lots of people. 

Comment by Lukas_Gloor on Alexander and Yudkowsky on AGI goals · 2023-01-25T15:14:47.504Z · LW · GW

Thanks for posting this! 

I really liked Scott's first question in the section "Analogies to human moral development" and the discussion that ensued there. 

I think Eliezer's reply at [14:21] is especially interesting. If I understand it correctly, he's saying that it was a (fortunate) coincidence about what sort of moves evolution had available and what the developmental constraints were at the time, that "build in empathy/pro-social emotions" was an easy way to make people better at earning social rewards from our environment. [And maybe a further argument here is that once we start climbing upward on the gradient towards more empathy, the strategy of "also simultaneously become better at lying and deceiving" no longer gives highest rewards, because there are tradeoffs where it's bad to have (automatically accessible, ever-present) pro-social emotions if you go for a manipulative and exploitative life-strategy.]

By contrast, probably the next part of the argument is that we have no strong reason to expect gradient updates in ML agents to stumble upon a similarly simple attractor as "increase your propensity to experience compassion or feel others' emotions when you're anyway already modeling others' behavior based on what you'd do yourself in their situation." And is this because gradient descent updates too many things at once and there aren't any developmental constraints that would make a simple trick like "dial up pro-social emotions" reliably more successful than alternatives that involve more deception? That seems somewhat plausible to me, but I have some lingering doubts of the form "isn't there a sense in which honesty is strictly easier than deception (related: entangled truths, contagious lies), so ML agents might just stumble upon it if we try to reward them for socially cooperative behavior?"

What's the argument against that? (I'm not arguing for a high probability of "alignment by default" – just against confidently estimating it at <10%.) 

Somewhat related: In the context of Shard theory, I shared some speculative thoughts on developmental constraints arguably making it easier (comparative what things could be like if evolution had easier access to more of "mind-design space") to distinguish pro-social from anti-social phenotypes among humans. Mimicking some of these conditions (if we understood AI internals well-enough to steer things) could maybe be a promising component for alignment work? 

Comment by Lukas_Gloor on Covid 1/12/23: Unexpected Spike in Deaths · 2023-01-12T15:17:27.759Z · LW · GW

Great post! It's sadly rare to see people talk about Covid with the nuance to have a position in between "long covid (or xbb1.5) is a threat to civilization" and "long covid (or xbb1.5) is no concern at all."


Comment by Lukas_Gloor on Let’s think about slowing down AI · 2023-01-10T18:36:51.689Z · LW · GW

Those are good points. There are some considerations that go in the other direction. Sometimes it's not obvious what's a "failure to convince people" vs. "a failure of some people to be convincible." (I mean convincible by object-level arguments as opposed to convincible through social cascades where a particular new view reaches critical mass.) 

I believe both of the following: 

  • Persuasion efforts haven't been exhausted yet: we can do better at reaching not-yet-safety-concerned AI researchers. (That said, I think it's at least worth considering that we're getting close to exhausting low-hanging fruit?)
  • Even so, "persuasion as the main pillar of a strategy" is somewhat likely to be massively inadequate because it's difficult to change the minds and culture of humans in general (even if they're smart), let alone existing organizations.

Another point that's maybe worth highlighting is that the people who could make large demands don't have to be the same people who are best-positioned for making smaller asks. (This is Katja's point about there not being a need for everyone to coordinate into a single "we.") The welfarism vs. abolitionism debate in animal advocacy and discussion of the radical flank effect seems related. I also agree with a point lc makes in his post on slowing down AI. He points out that there's arguably a "missing mood" around the way most people in EA and the AI alignment community communicate with safety-unconcerned researchers. The missing sense of urgency probably lowers the chance of successful persuasion efforts?

Lastly, it's a challenge that there's little consensus in the EA research community around important questions like "How hard is AI alignment?," "How hard is alignment conditional on <5 years to TAI?," and "How long are TAI timelines?" (Though maybe there's quite some agreement on the second one and the answer is at least, "it's not easy?")

I'd imagine there would at least be quite a strong EA expert consensus on the following conditional statement (which has both normative and empirical components):

Let's call it "We're in Inconvenient World" if it's true that, absent strong countermeasures, we'll have misaligned AI that brings about human extinction in <5 years. If the chance that we're in Inconvenient World is10% or higher, we should urgently make large changes to the way AI development progresses as a field/industry.

Based on this, some further questions one could try to estimate are: 

  • How many people (perhaps weighted by their social standing within an organization, opinion leaders, etc.) are convincible of the above conditional statement? Is it likely we could reach a critical mass?
    • Doing this for any specific org (or relevant branch of government, etc.) that seems to play a central role
  • What's the minimum consensus threshold for "We're in Inconvenient World?" (I.e., what percentage would be indefensibly low to believe in light of peer disagreement unless one considers oneself the world's foremost authority on the question?)
Comment by Lukas_Gloor on Covid 1/5/23: Various XBB Takes · 2023-01-06T11:32:37.601Z · LW · GW

Is it terrifying because you think it's bad if some people kill themselves with the program who otherwise wouldn't have? Or is it terrifying that things are so bad in general that, once you give them more options, more will choose death?

If you mean the former – I really don't share this judgment. Maybe you're worried that sociopathic or act-utilitarian doctors will pressure patients – but these risks could be mitigated with safeguards, and they have to be weighed against the benefits (people being able to make informed choices about what they want and have their options increased). 

If it's the latter, then I agree. But I'm not surprised by this information, sadly.

Edit:  And the same question to Zvi: 

Richard’s core argument boils down to Yay Individual Liberty, and that there are a lot of people out there suffering quite a lot. These are important points. My main takeaway, however, was that yes the whole thing already, in its current state, seems rather terrifying.

What does this mean? What's the implication for policies you would push for? Should other countries try to install something similar or would that be terrible? 

I guess it's reasonable to be like "I don't know, seems like a tough call and terrifying either way." But there's a risk that, if one feels an impulse to shy away from contemplating some topic in depth because it seems "terrifying," it leads to biased opinions. 

Comment by Lukas_Gloor on Covid 1/5/23: Various XBB Takes · 2023-01-06T11:13:41.399Z · LW · GW

>Relatedly, I argue against the “culture of life” argument on the >grounds that we place too much value on human life and it would >be better if we placed less on it, a view that conservatives >implicitly hold on topics like covid restrictions.

All right then.

He's just saying that it's unsustainable to have sacred values when there are tradeoffs to everything. That's a point you'd accept for many contexts, so reacting with sneer here seems a bit unfair.

That said, I think his point would've sounded a lot better if he added a sentence like "placing less value on human life means we get to place more value on other things that also matter."

The way he said it, it indeed sounds a bit dystopian – but what's the alternative? It's difficult to estimate what numbers for assisted suicide are "high" or "low" in the sense that matters morally. Some % of the people would've committed suicide also without assistance, and you'd probably agree that those people are unambiguously better off with the assistance. Then, some people would take assistance but wouldn't want to kill themselves in the messy way. That seems very reasonable – it's super scary and probably more traumatizing for anyone you leave behind to do it with household methods. Not to mention that there's a risk that it doesn't quite succeed and you're left way worse than before! So, you can't just go from "Some people kill themselves with assistance who otherwise wouldn't" to "Therefore, it's preposterous that this doctor describes assisted suicide as moral progress." 

Sometimes people want to do something difficult that they think is good for them, but they don't dare to do it because it's really aversive. Consider how many people didn't ask out someone they had a crush on because they were too scared – asking out your crush is probably a hundred times easier than jumping off a bridge or hanging yourself or what not.

Unless there's something particularly bad about how the medically assisted suicide program is implemented, I indeed consider this sort of thing moral progress. 

Comment by Lukas_Gloor on 2022 was the year AGI arrived (Just don't call it that) · 2023-01-04T17:23:47.450Z · LW · GW

I like the diagram with the "you are here!" 

I hope that this isn't actually where we are, but I'm noticing that it's becoming increasingly less easy to argue that we're not there yet. That's pretty concerning! 

Comment by Lukas_Gloor on Sazen · 2022-12-21T12:42:29.010Z · LW · GW

Yeah! What you're describing is something a bit in between the informal definition of sazen

More informally: it's a handle that is useful as a pointer to the already-initiated, who can recognize its correctness and fill in the necessary gaps, but either useless or actively misleading to the uninitiated, who will either Simply Not Get It, or (much worse) fill in the gaps with their own preconceptions (which are likely to lead them astray).

And something that allows for some "uninitiated" people to quickly get it (if they've previously been thinking along similar lines). 

However, there are people who will never get a new concept based on sparse info. Some people don't look at reality to ask themselves "What could be the concept that my interlocutor thinks carves an important aspect of reality at its joints? What might they be pointing at?" Instead, they only look at your words and the concepts they associate with these words, but they never do the back and forth between words-as-pointers and reality-where-the-structure-is, to help you with communication. They interpret all your words with the rigidity of not thinking of words as pointers (these are often people who do well at academic writing in formal contexts). You have to take them step by step with an entire sequence if you want a chance of conveying your creative discoveries to them.

Comment by Lukas_Gloor on Take 7: You should talk about "the human's utility function" less. · 2022-12-09T12:21:14.396Z · LW · GW

Ineffective values do not need to be considered for a utility function as they do not effect what gets strived for. If you say "I will choose B" and still choose A you are still choosing A. You are not required to be aware of your utility function.

Uff, a future where humans get more of what they're striving for but without adjusting for biases and ineffectual values? Why would you care about saving our species, then? 

It sounds like people are using "utility function" in different ways in this thread. 

Comment by Lukas_Gloor on Take 7: You should talk about "the human's utility function" less. · 2022-12-09T01:32:09.797Z · LW · GW

Questions like "what would this human do in a situation where there is a cat in a room" has a unique answer that reflects reality, as if that kidn of situation was ran then something would need to happen.

It's not about what the human would do in a given situation. It's about values – not everything we do reflects our values. Eating meat when you'd rather be vegetarian, smoking when you'd rather not, etc. How do you distinguish biases from fundamental intuitions? How do you infer values from mere observations of behavior? There are a bunch of problems described in this sequence. Not to mention stuff I discuss here about how values may remain under-defined even if we specify a suitable reflection procedure and have people undergo that procedure. 

Comment by Lukas_Gloor on Covid 12/8/22: Another Winter Wave · 2022-12-08T22:45:55.943Z · LW · GW
Comment by Lukas_Gloor on Sadly, FTX · 2022-11-18T12:53:06.842Z · LW · GW

Are you saying that Madoff would have been less of a fraud if he had sold some of his hairs for $1,000 each to co-conspirators, then noted down the market value of his remaining hair in the billions and posted it as collateral for debts and also as backing for people invested in his Ponzi scheme? 

I guess that's technically true because people should've known what they're buying (FTT worked as advertized*). But it seems like a small difference to me. 

*EXCEPT that no one advertized that it would be used to secure deposits or be relied on heavily as collateral.

Comment by Lukas_Gloor on Noting an unsubstantiated communal belief about the FTX disaster · 2022-11-13T13:25:08.205Z · LW · GW

Double-posted as an after thought and kept comments separate because they say separate things (so people can vote separately). 

The type of view "I don't think this changes anything" in the second comment is proactively replying to is this one: 

(Maybe this is obvious, but sometimes I hear people say "I can't imagine that he isn't serious about EA" as though it makes other things about someone's character impossible, which is not true.) 

Comment by Lukas_Gloor on Noting an unsubstantiated communal belief about the FTX disaster · 2022-11-13T11:05:20.427Z · LW · GW

I don't think this changes anything. It's still possible for someone with EA motivations to have dark triad traits, so I wouldn't say "he was motivated by EA principles" implies that the same thing could've happened to almost anyone with EA principles. (What probably could've happened to more EAs is being complicit in the inner circle as lieutenants.)

"Feeling good about being a hero" is a motivation that people with dark triad traits can have just like anyone else. (The same goes for being deeply interested and obsessed with certain intellectual pursuits, like moral philosophy or applying utilitarianism to your life.) Let's assume someone has a dark triad personality. I model people like that as the same as a more neurotypical person except that they: 

  • Feel the same way I feel about people I find annoying and unsympathetic about 99.9-100% of people.
  • Don't have any system-1 fear of bad consequences. Don't have any worries related to things like guilt or shame (or maybe do have issues around shame but it expresses itself more in externalizing negative emotions like jealousy, spite).
  • Find it uncannily easy to move on from close relationships or shut empathy on and off at will as circumstances change regarding what's advantageous for them (if they ever form closer connections in the first place).

There are more factors that are different, but with some of the factors you wonder if they're just consequences of the above. For instance, being power-hungry: if you can't find meaning in close relationships, what else is there to do? Or habitual lying: if you find nearly everyone unsympathetic and annoying and you don't experience the emotion of guilt, you probably find it easier (and more pleasant) to lie.

In short, I think people with dark triad traits lack a bunch of prosocial system-1 stuff, but they can totally aim to pursue system-2 goals like "wanting to be a hero" like anyone else. 

(Maybe this is obvious, but sometimes I hear people say "I can't imagine that he isn't serious about EA" as though it makes other things about someone's character impossible, which is not true.)