Posts
Comments
I agree that it sounds somewhat premature to write off Larry Page based on attitudes he had a long time ago, when AGI seemed more abstract and far away, and then not seek/try communication with him again later on. If that were Musk's true and only reason for founding OpenAI, then I agree that this was a communication fuckup.
However, my best guess is that this story about Page was interchangeable with a number of alternative plausible criticisms of his competition on building AGI that Musk would likely have come up with in nearby worlds. People like Musk (and Altman too) tend to have a desire to do the most important thing and the belief that they can do this thing a lot better than anyone else. On that assumption, it's not too surprising that Musk found a reason for having to step in and build AGI himself. In fact, on this view, we should expect to see surprisingly little sincere exploration of "joining someone else's project to improve it" solutions.
I don't think this is necessarily a bad attitude. Sometimes people who think this way are right in the specific situation. It just means that we see the following patterns a lot:
- Ambitious people start their own thing rather than join some existing thing.
- Ambitious people have fallouts with each other after starting a project together where the question of "who eventually gets de facto ultimate control" wasn't totally specified from the start.
(Edited away a last paragraph that used to be here 50mins after posting. Wanted to express something like "Sometimes communication only prolongs the inevitable," but that sounds maybe a bit too negative because even if you're going to fall out eventually, probably good communication can help make it less bad.)
I thought the part you quoted was quite concerning, also in the context of what comes afterwards:
Hiatus: Sam told Greg and Ilya he needs to step away for 10 days to think. Needs to figure out how much he can trust them and how much he wants to work with them. Said he will come back after that and figure out how much time he wants to spend.
Sure, the email by Sutskever and Brockman gave some nonviolent communication vibes and maybe it isn't "the professional thing" to air one's feelings and perceived mistakes like that, but they seemed genuine in what they wrote and they raised incredibly important concerns that are difficult in nature to bring up. Also, with hindsight especially, it seems like they had valid reasons to be concerned about Altman's power-seeking tendencies!
When someone expresses legitimate-given-the-situation concerns about your alignment and your reaction is to basically gaslight them into thinking they did something wrong for finding it hard to trust you, and then you make it seem like you are the poor victim who needs 10 days off of work to figure out whether you can still trust them, that feels messed up! (It's also a bit hypocritical because the whole "I need 10 days to figure out if I can still trust you for thinking I like being CEO a bit too much," seems childish too.)
(Of course, these emails are just snapshots and we might be missing things that happened in between via other channels of communication, including in-person talks.)
Also, I find it interesting that they (Sutskever and Brockman) criticized Musk just as much as Altman (if I understood their email correctly), so this should make it easier for Altman to react with grace. I guess given Musk's own annoyed reaction, maybe Altman was calling the others' email childish to side with Musks's dismissive reaction to that same email.
Lastly, this email thread made me wonder what happened between Brockman and Sutskever in the meantime, since it now seems like Brockman no longer holds the same concerns about Altman even though recent events seem to have given a lot of new fire to them.
Some of the points you make don't apply to online poker. But I imagine that the most interesting rationality lessons from poker come from studying other players and exploiting them, rather than memorizing and developing an intuition for the pure game theory of the game.
- If you did want to focus on the latter goal, you can play online poker (many players can >12 tables at once) and after every session, run your hand histories through a program (e.g., "GTO Wizard") that will tell you where you made mistakes compared to optimal strategy, and how much they would cost you against an optimal-playing opponent. Then, for any mistake, you can even input the specific spot into the trainer program and practice it with similar hands 4-tabling against the computer, with immediate feedback every time on how you played the spot.
It seems important to establish whether we are in fact going to be in a race and whether one side isn't already far ahead.
With racing, there's a difference between optimizing the chance of winning vs optimizing the extent to which you beat the other party when you do win. If it's true that China is currently pretty far behind, and if TAI timelines are fairly short so that a lead now is pretty significant, then the best version of "racing" shouldn't be "get to the finish line as fast as possible." Instead, it should be "use your lead to your advantage." So, the lead time should be used to reduce risks.
Not sure this is relevant to your post in particular; I could've made this point also in other discussions about racing. Of course, if a lead is small or non-existent, the considerations will be different.
I wrote a long post last year saying basically that.
Even if attaining a total and forevermore cessation of suffering is substantially more difficult/attainable by substantially fewer people in one lifetime, I don't think it's unreasonable to think that most people could suffer at least 50 percent less with dedicated mindfulness practice. I'm curious as to what might feed an opposing intuition for you! I'd be quite excited about empirical research that investigates the tractability and scalability of meditation for reducing suffering, in either case.
My sense is that existing mindfulness studies don't show the sort of impressive results that we'd expect if this were a great solution.
Also, I think people who would benefit most from having less day-to-day suffering often struggle with having no "free room" available for meditation practice, and that seems like an issue that's hard to overcome even if meditation practice would indeed help them a lot.
It's already sign of having a decently good life when you're able to start dedicating time for something like meditation, which I think requires a bit more mental energy than just watching series or scrolling through the internet. A lot of people have leisure time, but it's a privilege to be mentally well off enough to do purposeful activities during your leisure time. The people who have a lot of this purposeful time probably (usually) aren't among the ones that suffer most (whereas the people who don't have it will struggle sticking to regular meditation practice, for good reasons).
For instance, if someone has a chronic illness with frequent pain and nearly constant fatigue, I can see how it might be good for them to practice meditation for pain management, but higher up on their priority list are probably things like "how do I manage to do daily chores despite low energy levels?" or "how do I not get let go at work?."
Similarly, for other things people may struggle with (addictions, financial worries, anxieties of various sorts; other mental health issues), meditation is often something that would probably help, but it doesn't feel like priority number one for people with problem-ridden, difficult lives. It's pretty hard to keep up motivation for training something that you're not fully convinced of it being your top priority, especially if you're struggling with other things.
I see meditation as similar to things like "eat healthier, exercise more, go to sleep on time and don't consume distracting content or too much light in the late evenings, etc." And these things have great benefits, but they're also hard, so there are no low-hanging fruit and interventions in this space will have limited effectiveness (or at least limited cost-effectiveness; you could probably get quite far if you gifted people their private nutritionist cook, fitness trainer and motivator, house cleaner and personal assistant, meditation coach, give them enough money for financial independence, etc.).
And then the people who would have enough "free room" to meditate may be well off enough to not feel like they need it? In some ways, the suffering of a person who is kind of well off in life isn't that bad and instead of devoting 1h per day for meditation practice to reduce the little suffering that they have, maybe the well-off person would rather take Spanish lessons, or train for a marathon, etc.
(By the way, would it be alright if I ping you privately to set up a meeting? I've been a fan of your writing since becoming familiar with you during my time at CLR and would love a chance to pick your brain about SFE stuff and hear about what you've been up to lately!)
I'll send you a DM!
[...] I am certainly interested to know if anyone is aware of sources that make a careful distinction between suffering and pain in arguing that suffering and its reduction is what we (should) care about.
I did so in my article on Tranquilism, so I broadly share your perspective!
I wouldn't go as far as what you're saying in endnote 9, though. I mean, I see some chance that you're right in the impractical sense of, "If someone gave up literally all they cared about in order to pursue ideal meditation training under ideal circumstances (and during the training they don't get any physical illness issues or otherwise have issues crop up that prevent successfully completion of the training), then they could learn to control their mental states and avoid nearly all future sources of suffering." But that's pretty impractical even if true!
It's interesting, though, what you say about CBT. I agree it makes sense to be accurate about these distinctions, and that it could affect specific interventions (though maybe not at the largest scale of prioritization, the way I see the landscape).
This would be a valid rebuttal if instruction-tuned LLMs were only pretending to be benevolent as part of a long-term strategy to eventually take over the world, and execute a treacherous turn. Do you think present-day LLMs are doing that? (I don't)
Or that they have a sycophancy drive. Or that, next to "wanting to be helpful," they also have a bunch of other drives that will likely win over the "wanting to be helpful" part once the system becomes better at long-term planning and orienting its shards towards consequentialist goals.
On that latter model, the "wanting to be helpful" is a mask that the system is trained to play better and better, but it isn't the only thing the system wants to do, and it might find that once its gets good at trying on various other masks to see how this will improve its long-term planning, it for some reason prefers a different "mask" to become its locked-in personality.
I thought the first paragraph and the boldened bit of your comment seemed insightful. I don't see why what you're saying is wrong – it seems right to me (but I'm not sure).
I am not convinced MIRI has given enough evidence to support the idea that unregulated AI will kill everyone and their children.
The way you're expressing this feels like an unnecessarily strong bar.
I think advocacy for an AI pause already seems pretty sensible to me if we accept the following premises:
- The current AI research paradigm mostly makes progress in capabilities before progress in understanding.
(This puts AI progress in a different reference class from most other technological progress, so any arguments with base rates from "technological progress normally doesn't kill everyone" seem misguided.) - AI could very well kill most of humanity, in the sense that it seems defensible to put this at anywhere from 20-80% (we can disagree on the specifics of that range, but that's where I'd put it looking at the landscape of experts who seem to be informed and doing careful reasoning (so not LeCun)).
- If we can't find a way to ensure that TAI is developed by researchers and leaders who act with a degree of responsibility proportional to the risks/stakes, it seems better to pause.
Edited to add the following:
There's also a sense in which whether to pause is quite independent from the default risk level. Even if the default risk were only 5%, if there were a solid and robust argument that pausing for five years will reduce it to 4%, that's clearly very good! (It would be unfortunate for the people who will die preventable deaths in the next five years, but it still helps overall more people to pause under these assumptions.)
Would most existing people accept a gamble with 20% of chance of death in the next 5 years and 80% of life-extension and radically better technology? I concede that many would, but I think it's far from universal, and I wouldn't be too surprised if half of people or more think this isn't for them.
I personally wouldn't want to take that gamble (strangely enough I've been quite happy lately and my life has been feeling meaningful, so the idea of dying in the next 5 years sucks).
(Also, I want to flag that I strongly disagree with your optimism.)
we have found Mr Altman highly forthcoming
That's exactly the line that made my heart sink.
I find it a weird thing to choose to say/emphasize.
The issue under discussion isn't whether Altman hid things from the new board; it's whether he hid things to the old board a long while ago.
Of course he's going to seem forthcoming towards the new board at first. So, the new board having the impression that he was forthcoming towards them? This isn't information that helps us much in assessing whether to side with Altman vs the old board. That makes me think: why report on it? It would be a more relevant update if Taylor or Summers were willing to stick their necks out a little further and say something stronger and more direct, something more in the direction of (hypothetically), "In all our by-now extensive interactions with Altman, we got the sense that he's the sort of person you can trust; in fact, he had surprisingly circumspect and credible things to say about what happened, and he seems self-aware about things that he could've done better (and those things seem comparatively small or at least very understandable)." If they had added something like that, it would have been more interesting and surprising. (At least for those who are currently skeptical or outright negative towards Altman; but also "surprising" in terms of "nice, the new board is really invested in forming their own views here!").
By contrast, this combination of basically defending Altman (and implying pretty negative things about Toner and McCauley's objectivity and their judgment on things that they deem fair to tell the media), but doing so without sticking their necks out, makes me worried that the board is less invested in outcomes and more invested in playing their role. By "not sticking their necks out," I mean the outsourcing of judgment-forming to the independent investigation and the mentioning of clearly unsurprising and not-very-relevant things like whether Altman has been forthcoming to them, so far. By "less invested in outcomes and more invested in playing their role," I mean the possibility that the new board maybe doesn't consider it important to form opinions at the object level (on Altman's character and his suitability for OpenAI's mission, and generally them having a burning desire to make the best CEO-related decisions). Instead, the alternative mode they could be in would be having in mind a specific "role" that board members play, which includes things like, e.g., "check whether Altman ever gets caught doing something outrageous," "check if he passes independent legal reviews," or "check if Altman's answers seem reassuring when we occasionally ask him critical questions." And then, that's it, job done. If that's the case, I think that'd be super unfortunate. The more important the org, the more it matters to have a engaged/invested board that considers itself ultimately responsible for CEO-related outcomes ("will history look back favorably on their choices regarding the CEO").
To sum up, I'd have much preferred it if their comments had either included them sticking their neck out a little more, or if I had gotten from them more of a sense of still withholding judgment. I think the latter would have been possible even in combination with still reminding the public that Altman (e.g.,) passed that independent investigation or that some of the old board members' claims against him seem thinly supported, etc. (If that's their impression, fair enough.) For instance, it's perfectly possible to say something like, "In our duty as board members, we haven't noticed anything unusual or worrisome, but we'll continue to keep our eyes open." That's admittedly pretty similar, in substance, to what they actually said. Still, it would read as a lot more reassuring to me because of its different emphasis My alternative phrasing would help convey that (1) they don't naively believe that Altman – in worlds where he is dodgy – would have likely already given things away easily in interactions towards them, and (2) that they consider themselves responsible for the outcome (and not just following of the common procedures) of whether OpenAI will be led well and in line with its mission.
(Maybe they do in fact have these views, 1 and 2, but didn't do a good job here at reassuring me of that.)
Followed immediately by:
I too also have very strong concerns that we are putting a person whose highest stats are political maneuvering and deception, who is very high in power seeking, into this position. By all reports, you cannot trust what this man tells you.
For me, the key question in situations when leaders made a decision with really bad consequences is, "How did they engage with criticism and opposing views?"
If they did well on this front, then I don't think it's at all mandatory to push for leadership changes (though certainly, the worse someones track record gets, the more that speaks against them).
By contrast, if leaders tried to make the opposition look stupid or if they otherwise used their influence to dampen the reach of opposing views, then being wrong later is unacceptable.
Basically, I want to allow for a situation where someone was like, "this is a tough call and I can see reasons why others wouldn't agree with me, but I think we should do this," and then ends up being wrong, but I don't want to allow situations where someone is wrong after having expressed something more like, "listen to me, I know better than you, go away."
In the first situation, it might still be warranted to push for leadership changes (esp. if there's actually a better alternative), but I don't see it as mandatory.
The author of the original short form says we need to hold leaders accountable for bad decisions because otherwise the incentives are wrong. I agree with that, but I think it's being too crude to tie incentives to whether a decision looks right or wrong in hindsight. We can do better and evaluate how someone went about making a decision and how they handled opposing views. (Basically, if opposing views aren't loud enough that you'd have to actively squish them using your influence illegitimately, then the mistake isn't just yours as the leader; it's also that the situation wasn't significantly obvious to others around you.) I expect that everyone who has strong opinions on things and is ambitious and agenty in a leadership position is going to make some costly mistakes. The incentives shouldn't be such that leaders shy away from consequential interventions.
I agree with what you say in the first paragraph. If you're talking about Ilya, which I think you are, I can see what you mean in the second paragraph, but I'd flag that even if he had some sort of plan here, it seems pretty costly and also just bad norms for someone with his credibility to say something that indicates that he thinks OpenAI is on track to do well at handling their great responsibility, assuming he were to not actually believe this. It's one thing to not say negative things explicitly; it's a different thing to say something positive that rules out the negative interpretations. I tend to take people at their word if they say things explicitly, even if I can assume that they were facing various pressures. If I were to assume that Ilya is saying positive things that he doesn't actually believe, that wouldn't reflect well on him, IMO.
If we consider Jan Leike's situation, I think what you're saying applies more easily, because him leaving without comment already reflects poorly on OpenAI's standing on safety, and maybe he just decided that saying something explicitly doesn't really add a ton of information (esp. since maybe there are other people who might be in a better position to say things in the future). Also, I'm not sure it affects future employment prospects too much if someone leaves a company, signs a non-disparagement agreement, and goes "no comment" to indicate that there was probably dissatisfaction with some aspects of the company. There are many explanations for this and if I was making hiring decisions at some AI company, even if it's focused on profits quite a bit, I wouldn't necessarily interpret this as a negative signal.
That said, signing non-disparagament agreements certainly feels like it has costs and constrains option value, so it seems like a tough choice.
It seems likely (though not certain) that they signed non-disparagement agreements, so we may not see more damning statements from them even if that's how they feel. Also, Ilya at least said some positive things in his leaving announcement, so that indicates either that he caved in to pressure (or too high agreeableness towards former co-workers) or that he's genuinely not particularly worried about the direction of the company and that he left more because of reasons related to his new project.
I agree: appealing to libertarianism shouldn't automatically win someone the argument on whether it's okay to still have factory farms.
The fact that Zvi thought he provided enough of a pointer to an argument there feels weird, in my opinion.
That said, maybe he was mostly focused on wanting to highlight that a large subset of people who are strongly against this ban (and may use libertarian arguments to argue for their position) are only against bans when it suits their agenda. So, maybe the point was in a way more about specific people's hypocrisy in how they argue than the question of concern for animals.
Either way, I continue to appreciate all these newsletters and I admit that the opinionated tone often makes them more interesting/fun to read in cases where it's not triggering me on issues that I see differently.
I think one issue is that someone can be aware about a specific worldview's existence and even consider it a plausible worldview, but still be quite bad at understanding what it would imply/look like in practice if it were true.
For me personally, it's not that I explicitly singled out the scenario that happened and assigned it some very low probability. Instead, I think I mostly just thought about scenarios that all start from different assumptions, and that was that.
For instance, when reading Paul's "What failure looks like" (which I had done multiple times), I thought I understood the scenario and even explicitly assigned it significant likelihood, but as it turned out, I didn't really understand it because I never really thought in detail about "how do we get from a 2021 world (before chat-gpt) to something like the world when things go off the rails in Paul's description?" If I had asked myself that question, I'd probably have realized that his worldview implies that there probably isn't a clear-cut moment of "we built the first AGI!" where AI boxing has relevance.
I lean towards agreeing with the takeaway; I made a similar argument here and would still bet on the slope being very steep inside the human intelligence level.
In some of his books on evolution, Dawkins also said very similar things when commenting on Darwin vs Wallace, basically saying that there's no comparison, Darwin had a better grasp of things, justified it better and more extensively, didn't have muddled thinking about mechanisms, etc.
Very cool! I used to think Hume was the most ahead of his time, but this seems like the same feat if not better.
Yeah, you need an enormous bankroll to play $10,000 tournaments. What a lot of pros do is sell action. Let's say you're highly skilled and have a, say, 125% expected return on investment. If you find someone with a big bankroll and they're convinced of your skills, you can you sell them your action at a markup somewhere between 1 and 1.2 to incentivize them to make a profit. I'd say something like 1.1 markup is fairest, so you're paying them a good prize to weather the variance for you. At 1.1 markup, they pay 1.1x whatever it costs you to buy into the tournament. You can sell a large part of your action but not quite all of it to keep an incentive to play well (if you sold everything at $11,000, you could, if you were shady, just pocket the extra $1,000, go out early on purpose, and register the next tournament where you sold action for another round of instant profit).
So, let's say they paid you $8,800 to get 80% of your winnings, so they make an expected profit of ($8,000 * 1.25) - $8,000, which is $1,200. And then you yourself still have 20% of your own action, for which you only paid $1,200 (since you got $800 from the 1.1 markup and you invest that into your tournament). Now, you're only in for $1,200 of your own money, but you have 20% of the tournament, so you'd already be highly profitable if you were just breaking even. In addition, as we stipulated, you have an edge on the field expecting 125% ROI, so in expectation, that $1,200 is worth $2,000*1.25, which is $2,500. This still comes with a lot of variance, but your ROI is now so high that Kelly allows you to play a big tournament in this way even if your net worth is <$100k.
(This analysis simplified things assuming there's no casino fee. In reality, if a tournament is advertized as a $10k tournament, the buy in tends to be more like $10,500, and $500 is just the casino fee that doesn't go into the prize pool. This makes edges considerably smaller.)
Regarding busting a tournament with a risky bluff: In the comment above I was assuming we're playing cash game where chips are equivalent to real dollars and you can leave the table at any point. In tournaments, at least if they are not "winner takes it all" format (which they almost never are), there's additional expected value in playing a little more conservative than the strategy "maximizing expected value in chips." Namely, you have to figure out how "having lots of chips" translates into "probabilities of making various pay jumps." If you're close to the money, or close to a big pay jump when you're already in the money (and at a big final table, every pay jump tends to be huge!), you actually make money by folding, since every time you fold, there's a chance that some other player will go out (either behind you at your table, or at some other table in a tournament where there are still many tables playing). If someone else goes out and you make the pay jump, you get more money without having to risk your stack. So, in tournaments, you gotta be more selective with the big bluffs for multiples of what is already in the pot, especially if you think you have an edge on the field and if the pay jumps are close.
You also quote this part of the article:
Theo Boer, a healthcare ethics professor at Protestant Theological University in Groningen, served for a decade on a euthanasia review board in the Netherlands. “I entered the review committee in 2005, and I was there until 2014,” Boer told me. “In those years, I saw the Dutch euthanasia practice evolve from death being a last resort to death being a default option.” He ultimately resigned.
I found a submission by this Theo Boer for the UK parliament, where he explains his reasons for now opposing euthanasia in more detail.
He writes:
It is well known that British advocates of assisted dying argue for a more restricted law than is found in the low countries. Here is my prediction: any law that allows assisted dying will come to be experienced as an injustice and will be challenged in the courts. Why only euthanasia for terminally ill patients, who have access to an ever widening array of palliative care and whose suffering will be relatively short, whereas chronic patients may suffer more intensely and much longer? Why exclude psychiatric patients, many of whom are suffering most heartbreakingly of all? Why only an assisted death for people suffering from a disease, and not for those suffering from irremediable meaninglessness, alienation, loneliness, from life itself? We are presently seeing how in the years 2016-2023 Canada’s Medical Assistance in Dying (MAiD), from being euthanasia for terminal patients only, has evolved into an assisted death for patients whose chronic disease has become unbearable due to shortage of healthcare(Douthat 2022).
This is a "slope" of sorts, but I think it's not a bad one. The arguments for extending the practice all seem reasonable. What matters is, "are people suffering?" and, "are they right that there's not enough hope for them to justify continued suffering?"
Regarding pressure/being pressured, I thought this part was interesting:
This brings me to the second question: how to protect vulnerable citizens? Different from what is presently going on in Canada, I do not yet see a specific risk for citizens who by many are considered vulnerable – homeless, underinsured, people on welfare, people with disabilities. Although these groups are present in those who get euthanasia in the Netherlands, it is not my impression that they are overrepresented. If any group is well represented in the euthanasia numbers, it is the better-off, the healthy-aging population, the higher educated. In our research on practice variation, we found that in regions where the average experienced health is higher, the euthanasia numbers are also higher. In places where people on average are better off, obviously serious threats to their wellbeing tend to be more often a reason for a euthanasia request than in places where people are more used to dealing with life’s different hardships. This leads me to adopt a different definition of vulnerability, a vulnerability that may be found in all social and economic groups, from top to bottom: one of despair, meaninglessness, social isolation, feeling redundant. It may apply to wealthy citizens in a villa with woodblock floors and a grand piano, whose children have their businesses elsewhere and whose friends are either dead or institutionalized, just as much as to a single disabled woman on welfare. Anyone under this shadow of despair may make a euthanasia request, and there is no way a government can prevent this kind of vulnerability to motivate a euthanasia request, since the autonomous citizens are not under any other pressure than their own, that is, their own incapacity to face life’s harder episodes. “Life has always been a feast for me,” an elderly man whose euthanasia I assessed, “and that’s how it should end for me.”
I'd be curious to figure out why it is exactly that requests for euthanasia are higher in demographs where people tend to be better off/suffering less.
That said, I'm not at all convinced that this would prove that there's something horribly wrong going on with these developments after legalization of assisted suicide. (Still, I'd be curious to investigate this further.)
Reading this account, it feels to me like Theo Boer has a problem with death intrinsically, as opposed to only having a problem with death when a person has for-themselves good/strong reasons to continue to want to live. That's not an outlook I agree with.
"Their own incapacity to face life's harder episodes" is a question-begging phrasing. For all we know, many people who choose assisted suicide would voluntarily chose to continue with their suffering if there was more at stake that they cared about! For instance, if they learned that by continuing to suffer, they'd solve world poverty, they might continue to suffer. It seems wrong, then, to say they're "incapable," when the real reason is more about how they don't want it enough. It's their life, so their decision.
"Since the autonomous citizens are not under any other pressure than their own" – this is also an interesting perspective. He seems to be conceding that no matter how much society and relatives try to reassure chronically ill or disabled elderly people that they're still valued and cared about (something we absolutely must emphasize or work towards if it isn't everywhere the case!), those people will struggle with worries of being a burden. That's unfortunate, but also very natural. It's how I would feel too. But people who feel that way don't necessarily jump right towards considering assisted suicide! Consider two different cases:
- You still enjoy life/are happy.
- You have been feeling suicidal for many years, and there's no realistic hope for things to get better.
In which of these cases is "worries about being a burden" (even if you know on some level that these worries probably don't accurately reflect the reality of the views of your caretakers or loved ones) a bigger reason to sway your decision towards wanting euthanasia? Obviously, it is in the second case, where you lack positive reasons to stay alive, so negative reasons weigh more comparatively, even if they're quite weak in absolute terms. In fact, what if "what will other people think" had been the primary motivation that kept you wanting to live for a long time, as long as you could provide more value for your relatives or loved ones? Is there not also something disconcerting about that? (To underscore this point, many people who are depressed write that the main reason they don't consider suicide more seriously is because of what it would do to their relatives and loved ones. If you want to see for yourself, you can read reddit threads on this for examples of people's suicidal ideation.)
“Life has always been a feast for me,” an elderly man whose euthanasia I assessed, “and that’s how it should end for me.”
This quote comes at the end of a passage that was all about "pressure," but it has nothing to do with pressure anymore, nor does it have to do with being "incapable of facing life's hardship." Instead, it just sounds like this person disagrees about the view that "facing life's hardship" for no upside is something that's a virtue or otherwise important/right to do. This is more like an expression of a philosophy that life can be completed (see also the ending of the series 'The Good Place,') or, somewhat differently, that there's no need to prolong it after the best (and still-good) times are now over. If that's someone's attitude, let them have it.
Overall, I respect Theo Boer for both the work he's done for terminally ill patients in the early stages of the assisted suicide program in the Netherlands and for speaking out against the assisted suicide practice after it went in a direction that he no longer could support. At the same time, I think he has an attitude towards the topic that I don't agree with. In my view, he doesn't seem to take seriously how bad it is to suffer, and especially, how bad and pointlesss it is to suffer for no good reason.
It might well be true that doctors and mental health professionals are now okay with assisted suicide as a solution too quickly (without trying other avenues first), but I'm not sure I'd trust Theo Boer's judgment on this, given the significant differences in our points of view. In any case, I acknowledge this is a risk and that we should take steps to make sure this doesn't occur or doesn't become too strong of an issue (and people who decide to go through with it should be well-informed about other options and encouraged to try these other options in case they haven't already been doing this to no success for many years).
Assisted Suicide Watch
A psychiatrist overstepping their qualifications by saying “It’s never gonna get any better” ((particularly when the source of the suffering is at least partly BPD, for which it's commonly known that symptoms can get better in someone's 40s)) clearly should never happen.
However, I'd imagine that most mental health professionals would be extremely careful when making statements about whether there's hope for things to get better. In fact, there are probably guidelines around that.
Maybe it didn't happen this way at all: I notice I'm confused.
This could just be careless reporting by the newspaper.
The article says:
She recalled her psychiatrist telling her that they had tried everything, that “there’s nothing more we can do for you. It’s never gonna get any better.”
Was it really the psychiatrist who added "It's never gonna get any better," or was it just that the psychiatrist said "There's nothing more we can do for you," and then Zoraya herself (the person seeking assisted suicide) told the reporters her conclusion "It's never going to get any better," and the reporters wrote it as though she ascribed those words to the psychiatrist?
In any case, this isn't a proper "watch" ("assisted suicide watch") if you only report when you find articles that make the whole thing seem slippery-slopy. (And there's also a question of "how much is it actually like that?" vs "How much is it in the reporting" – maybe the reporter had their own biases in writing it like that. For all we know, this person, Zoraya, has had this plan for ever since she was a teenager, and gave herself 25 years to stop feeling suicidal, and now it's been enough. And the reporter just chose to highlight a few things that sound dramatic, like the bit about not wanting to inconvenience the boyfriend with having to keep the grave tidy.)
I feel like the response here should be: Think hard about what sorts of guidelines we can create for doctors or mental health professionals to protect against risks of sliding down a slippery slope. It's worth taking some risks because it seems really bad as well to err in the other direction (as many countries and cultures still do). Besides, it's not straightforwardly evidence of a slippery slope simply because the numbers went up or seem "startling," as the article claims. These developments can just as plausibly be viewed as evidence for, "startlingly many people suffer unnecessarily and unacceptably without these laws." You have to look into the details to figure out which one it is, and it's gonna be partly a values question rather than something we can settle empirically.
There are other written-about cases like Lauren Hoeve quite recently, also from Netherlands, who'd suffered from debilitating severe myalgic encephalomyelitis (ME) for five years and began her assisted suicide application in 2022. Anyone interested in this topic should probably go through more of these accounts and read sources directly from the people themselves (like blogposts explaining their decision) rather than just media reporting about it.
If you know you have a winning hand, you do not want your opponent to fold, you want them to match your bet. So you kinda have to balance optimizing for the maximum pool at showdown with limiting the information you are leaking so there is a showdown. Or at least it would seem like that to me, I barely know the rules.
This is pretty accurate.
For simplicity, let's assume you have a hand that has a very high likelihood of winning at showdown on pretty much any runout. E.g., you have KK on a flop that is AK4, and your opponent didn't raise you before the flop, so you can mostly rule out AA. (Sure, if an A comes then A6 now beats you, or maybe they'll have 53s for a straight draw to A2345 with a 2 coming, or maybe they somehow backdoor into a different straight or flush depending on the runout and their specific hand – but those outcomes where you end up losing are unlikely enough to not make a significant difference to the math and strategy.)
The part about information leakage is indeed important, but rather than adjusting your bet sizing to prevent information leakage (i.e., "make the bet sizing smaller so it's less obvious that I've got a monster"), you simply add the right number of bluffs to your big-bet line to make your opponent's bluff-catching hands exactly indifferent. So, instead of betting small with KK to "keep them in" or "disguise the strength of your hand," you still bomb it, but you'd play the same way with a hand like J5ss (can pick up a 1-to-a-straight draw on all of the following turns: 2,3, T, Q; and can pick up a flush draw on any turn with a spade if there was one spade already on the flop).
To optimize for the maximum pot at showdown and maximum likelihood of getting called for all the chips, you want to bet the same proportion of the pot on each street (flop, turn, and river) to get all-in with the last bet. (This is forcing your opponent to defend the most; if you make just one huge bet of all-in right away, your opponent mathematically has to call you with fewer hands to prevent you from automatically profiting with every hand as a bluff.)
So, if the pot starts out at 6 big blinds (you raise 2.75x, get called by the big blind, and there's a 0.5 small blind in there as well). Your stack was100 big blinds to start. If you were to bet 100% of the pot on each street, this would be 6+18+54, which is 78, so slightly too small (you want it to sum to 100 or [100 minus the preflop raise of 2.75 big blinds or whatever]). So, with your very best hands, it's slightly better here to bet a little larger on each street, like 110% pot. (So you get something like 7 into 6 on the flop, 22 into 20 on the turn, and 70ish into 64 on the river for the rest of your stack.)
Let's say the board runs out very dry AK469, no flushes possible. You have KK, your opponent has A2 as a bluffcatcher (or AT, doesn't make a difference here because you wouldn't want to take this line with anything worse for value than AQ probably). If you get the ratio of bluffs-to-value exactly right, then your opponent is now faced with a choice: Either forfeit what's in the pot right now (zero further EV for them), or look you up with a bluffcatcher (also zero EV – they win when you're bluffing, but they lose when you've got it). If they overfold, you always win what's already in the pot and your bluffs are printing money. If they overcall, your bluffs become losing plays (and if you knew the opponent was overcalling, you'd stop bluffing!), but your value hands get paid off more often than the should in game theory, so the overall outcome is the same.
Of course, what is a proper "bluff-catching hand" depends on the board and previous action. If someone thinks any pair is a good bluffcatcher, but some of your bluffs include a pair that beats their pair, then they're committing a huge blunder if they call. (E.g., they might consider bluffcatching with the pocket pair 88 on this AK469 runout, but you might be bluffing J9s on the river, which is actually reasonable since you get remaining Kx to fold and some Ax to fold and your 9 is making it less likely that they backed into an easy-to-call two pair with K9 or A9. So, if you bluff 9x on the river, they're now giving you a present by ever calling with 88 [even if you have some other bluffs that they beat, their call is still losing lots of money in expectation because the bluff-catching ratio is now messed up.]) Or, conversely, if they fold a hand that they think is a bluffcatcher, but it actually beats some of your thinnest value hands (or at least ties with them), they'd again be blundering by folding. So, for instance, if someone folds AQ here, it could be that you went for a thin value bet with AQ yourself and they now folded something that not only beats all your bluffs, but also ties with some value (again changing the ratio favorably to make the call fairly highly +EV). (Note that a call can be +EV even if you're losing a little more than 50% of the time, because there's already a fair bit of money in the pot before the last bet).
The above bet-sizing example had some properties that made the analysis easy:
- Your KK didn't block the hands you most want to get value from (Ax hands).
[KK is actually a less clear example for betting big than 44, because at least on flop and turn, you should get value by lots of Kx hands. So 44 prefers betting big even more than KK because it gets paid more.] - The board AK4 was pretty static – hands that are good on the flop tend to be still good by the river. (Compare this to the "wet" board 8s75s, the "ss" signifying that there's a flush draw. Say you have 88 on this board for three of a kind 8s. It's unlikely your opponent has exactly 96 or 64 for a straight, so you're likely ahead at the moment. But will you still be ahead if a 6, a T, or a 4 comes, or if the flush comes in? Who knows, so things can change radically with even just one more card!).
- If we assume that you raised before the flop and got merely called by the big blind (everyone else folded), you have the advantage of top hands on this board because the big blind is really supposed to always raise AK, KK, and AA. "Trapping" them for surprise value would simply not be worth it: those hands win a lot more by making the pot bigger before the flop. Having the advantage in top hands allows you to bet really big.
These three bullet points all highlight separate reasons why a hand that is great on the flop shouldn't always go for the sizing to get all-in by the river with three proportional big bets ("geometric sizing"). To express them in simple heuristics:
- Bet smaller (or throw in a check) when you block the main hands you want to get called by.
- Bet smaller (or throw in a check) when a lot can still change later in the hand because of how "wet" the board is.
- Bet smaller (or throw in a check) if your opponent has the advantage of top hands – this means they should in theory do a lot of the betting for you and bet big and include bluffs. (Of course, if your opponent is generally too passive, it's better for you to do the betting yourself even if it's not following optimal theory.)
There's also a concept where you want to make sure you still have some good hands in nodes of the game where you only make a small bet or even check, so your opponent cannot always pounce on these signs of weakness. However, it's often sufficient to allocate your second tier hands there, you don't need to "protect your checking range" with something as strong as KK. This would be like allocating Achilles to defending the boats at the beach when the rest of your army storms against the walls of Troy. You want someone back there with the boats, but it doesn't have to be your top fighter.
Lastly, one other interesting situation where you want to check strong hands is if you think that the best way to get the money in is "check-in-order-to-check-raise" rather than "bet big outright." That happens when your "out of position" (your opponent will still have the option to bet after you check) and your opponent's range is capped to hands that are worth one big bet, but are almost never worth enough to raise against your big bet. In that situation, you get extra money from all their bluffs if you check, and the hands that would've called your big bet had you bet, those hands will bet for you anyway, and then have to make a though decision against your check-raise. So, let's say you're the big blind now on AK4 and you hold 99. This time, your opponent checks back the flop, indicating that they're unlikely to have anything very strong (except maybe AA that can't easily get paid). After he checks back, the turn is a 6. You both check once again. At this point, you're pretty sure that the best hands in your opponent's range are one pair at best, because it just wouldn't make sense for them to keep the pot small and give you a free river csard with a hand that can get paid off. The river is a 9 (bingo for you!). In theory, your opponent could have 99 themselves or back into two pair with K9, 96s, or A9 that weirdly enough didn't bet neither the turn nor the flop. However, since you have two 9s, you're blocking those strong hands your opponent could have pretty hard. As a result, if you were to bet big here with 99, your most likely outcome is getting called at best. That's a disaster and you want to get more value. So, you check. Your opponent will now bet KQ for value (since most of your Ax would've bet the river) and will bet all their weak Ax for value that they got there in the check-flop, check-turn line. Against this bet, your 99 now (and also A9 to some degree, but having the A isn't ideal because you want to get called by an A) can jam all-in for 10 times the size of the pot to make all your opponent's hands except rivered A9 indifferent. Any smaller bet sizing wouldn't make a lot of sense because you're never getting raised, so why not go for maximum value (or maximum pressure, in case you're bluffing). Your bluffs to balance this play should all contain a 9 themselves, so you again make it less likely that your opponent holds 99 or A9/K9. (Note that 9x can also function as a breakeven call against a river bet after you check, catching your opponent's river bluffs. However, since 9x never beats value, so you're not really wasting your handstrength by turning it into a bluff some of the time.) If you have the right ratio of bluffs-to-value for your 10x-the-pot overbet, your opponent is supposed to fold all Kx and most Ax hands, but some Ax has to defend because it blocks the A9 that you also go for value with. So, the opponent has to occasionally call this crazy big raise with just an A. If they never do this, and you know they never do it, you can check-jam every single 9x on the river and it makes for a +EV bluffs. (You probably cannot just check-jam every single hand because if you don't have a 9, the times your opponent has A9 or 99 or K9 are now drastically increased, and you'll always get called by those.)
Of course, if your opponent correctly guesses that you're jamming all your 9x there, they realize that you have a lot more 9x-that-is-just-one-pair than 99/A9/K9, so they'll now exploitatively call all their Ax. Since you now bet 10x the pot as a bluff with too many bluffs in expectation, you're losing way more than you were standing to gain, so you have to be especially careful with exploits here when betting many times the pot on the river.)
I really liked this post! I will probably link to it in the future.
Edit: Just came to my mind that these are things I tend to think of under the heading "considerateness" rather than kindness, but it's something I really appreciate in people either way (and the concepts are definitely linked).
FWIW, one thing I really didn't like about how he came across in the interview is that he seemed to be engaged in framing the narrative one-sidedly in an underhanded way, sneakily rather than out in the open. (Everyone tries to frame the narrative in some way, but it becomes problematic when people don't point out the places where their interpretation differs from others, because then listeners won't easily realize that there are claims that they still need to evaluate and think about rather than just take for granted and something that everyone else already agrees about.)
He was not highlighting the possibility that the other side's perspective still has validity; instead, he was shrugging that possibility under the carpet. He talked as though (implicitly, not explicitly) it's now officially established or obviously true that the board acted badly (Lex contributed to this by asking easy questions and not pushing back on anything too much). He focused a lot on the support he got during this hard time and people saying good things about him (eulogy while still alive comparison, highlighting that he thinks there's no doubt about his character) and said somewhat condescending things about the former board (about how he thinks they had good intentions, said in that slow voice and thoughtful tone, almost like they had committed a crime) and then emphasized their lack of experience.
For contrast, here are things he could have said that would have made it easier for listeners to come to the right conclusions (I think anyone who is morally scrupulous about whether they're in the right in situations when many others speak up against them would have highlighted these points a lot more, so the absence of these bits in Altman's interview is telling us something.)
- Instead of just saying that he believes the former board members came from a place of good intentions, also say if/whether he believes that some of the things they were concerned about weren't totally unreasonable from their perspective. E.g., acknowledge things he did wrong or things that, while not wrong, understandably would lead to misunderstandings.
- Acknowledge that just because a decision had been made by the review committee, the matter of his character and suitability for OpenAI's charter is not now settled (esp. given that the review maybe had a somewhat limited scope?). He could point out that it's probably rational (or, if he thinks this is not necesarily mandated, at least flag that he'd understand if some people now feel that way) for listeners of the youtube interview to keep an eye on him, while explaining how he intends to prove that the review committee came to the right decision.
- He said the board was inexperienced, but he'd say that in any case, whether or not they were onto something. Why is he talking about their lack of experience so much rather than zooming in on their ability to assess someone's character? It could totally be true that the former board was both inexperienced and right about Altman's unsuitability. Pointing out this possibility himself would be a clarifying contribution, but instead, he chose to distract from that entire theme and muddle the waters by making it seem like all that happened was that the board did something stupid out of inexperience, and that's all there was.
- Acknowledge that it wasn't just an outpouring of support for him; there were also some people who used to occasion to voice critical takes about him (and the Y Combinator thing came to light).
(Caveat that I didn't actually listen to the full interview and therefore may have missed it if he did more signposting and perspective taking and "acknowledging that for-him-inconvenient hypotheses are now out there and important if true and hard to dismiss entirely for at the very least the people without private info" than I would've thought from skipping through segments of the interview and Zvi's summary.)
In reaction to what I wrote here, maybe it's a defensible stance to go like, "ah, but that's just Altman being good at PR; it's just bad PR for him to give any air of legitimacy to the former board's concerns."
I concede that, in some cases when someone accuses you of something, they're just playing dirty and your best way to make sure it doesn't stick is by not engaging with low-quality criticism. However, there are also situations where concerns have enough legitimacy that shrugging them under the carpet doesn't help you seem trustworthy. In those cases, I find it extra suspicious when someone shrugs the concerns under the carpet and thereby misses the opportunity to add clarity to the discussion, make themselves more trustworthy, and help people form better views on what's the case.
Maybe that's a high standard, but I'd feel more reassured if the frontier of AI research was steered by someone who could talk about difficult topics and uncertainty around their suitability in a more transparent and illuminating way.
There are realistic beliefs Altman could have about what's good or bad for AI safety that would not allow Zvi to draw that conclusion. For instance:
- Maybe Altman thinks it's really bad for companies' momentum to go through CEO transitions (and we know that he believes OpenAI having a lot of momentum is good for safety, since he sees them as both adequately concerned about safety and more concerned about it than competitors).
- Maybe Altman thinks OpenAI would be unlikely to find another CEO who understands the research landscape well enough while also being good at managing, who is at least as concerned about safety as Altman is.
- Maybe Altman was sort of willing to "put that into play," in a way, but his motivation to do so wasn't a desire for power, nor a calculated strategic ploy, but more the understandable human tendency to hold a grudge (esp. in the short term) against the people who just rejected and humiliated him, so he understandably didn't feel a lot of motivational pull to want help them look better about the coup they had just attempted for what seemed to him as unfair/bad reasons. (This still makes Altman look suboptimal, but it's a lot different from "Altman prefers power so much that he'd calculatedly put the world at risk for his short-term enjoyment of power.")
- Maybe the moments where Altman thought things would go sideways were only very brief, and for the most part, when he was taking actions towards further escalation, he was already very confident that he'd win.
Overall, the point is that it seems maybe a bit reckless/uncharitable to make strong inferences about someone's rankings of priorities just based on one remark they made being in tension with them pushing in one direction rather than the other in a complicated political struggle.
Small edges are why there's so much money gambled in poker.
It's hard to reach a skill level where you make money 50% of the night, but it's not that hard to reach a point where you're "only" losing 60% of the time. (That's still significantly worse than playing roulette, but compared to chess competitions where hobbyists never win any sort of prize, you've at least got chances.)
You criticize Altman for pushing ahead with dangerous AI tech, but then most of what you'd spend the money on is pushing ahead with tech that isn't directly dangerous. Sure, that's better. But it doesn't solve the issue that we're headed into an out-of-control future. Where's the part where we use money to improve the degree to which thoughtful high-integrity people (or prosocial AI successor agents with those traits) are able to steer where this is all going?
(Not saying there are easy answers.)
I mean, personality disorders are all about problems in close interpersonal relationships (or lack of interest in such relationships, in schizoid personality disorder), and trust is always really relevant in such relationships, so I think this could be a helpful lens of looking at things. At the same time, I'd be very surprised if you could derive new helpful treatment approaches from this sort of armchair reasoning (even just at the level of hypothesis generation to be subjected to further testing).
Also, some of these seem a bit strained:
- Narcissistic personality disorder seems to be more about superiority and entitlement than expecting others to be trusting. And narcissism is correlated with Machiavellianism, where a feature of that is having a cynical worldview (i.e., thinking people in general aren't trustworthy). If I had to frame narcissism in trust terms, I'd maybe say it's an inability to value or appreciate trust?
- Histrionic personality disorder has a symptom criterion of "considers relationships to be more intimate than they actually are." I guess maybe you could say "since (by your hypothesis) they expect people to not care, once someone cares, a person with histrionic personality disorder is so surprised that they infer that the relationship must be deeper than it is." A bit strained, but maybe can be made to fit.
- Borderline: I think there's more of pattern to splitting than randomness (e.g., you rarely have splitting in the early honeymoon stage of a relationship), so maybe something like "fluctuating" would fit better. But also, I'm not sure what fluctuates is always about trust. Sure, sometimes splitting manifests in accusing the partner of cheating out of nowhere, but in other cases, the person may feel really annoyed at the partner in a way that isn't related to trust. (Or it could be related to trust, but going in a different direction: they may resent the partner for trusting them because they have such a low view of themselves that anyone who trusts them must be unworthy.)
- Dependent: To me the two things you write under it seem to be in tension with each other.
Edit:
Because it takes eight problems currently considered tied up with personal identy and essentially unsolvable [...]
I think treatment success probabilities differ between personality disorders. For some, calling them "currently considered essentially unsolvable" seems wrong.
And not sure how much of OCPD is explained by calling it a persistent form of OCD – they seem very different. You'd expect "persistent" to make something worse, but OCPD tends to be less of an issue for the person who has it (but can be difficult for others around them). Also, some symptoms seem to be non-overlapping, like with OCPD I don't think intrusive thoughts play a role (I might be wrong?), whereas intrusive thoughts are a distinct and telling feature of some presentations of OCD.
Dilemma:
- If the Thought Assessors converge to 100% accuracy in predicting the reward that will result from a plan, then a plan to wirehead (hack into the Steering Subsystem and set reward to infinity) would seem very appealing, and the agent would do it.
- If the Thought Assessors don’t converge to 100% accuracy in predicting the reward that will result from a plan, then that’s the very definition of inner misalignment!
[...]
The thought “I will secretly hack into my own Steering Subsystem” is almost certainly not aligned with the designer’s intention. So a credit-assignment update that assigns more positive valence to “I will secretly hack into my own Steering Subsystem” is a bad update. We don’t want it. Does it increase “inner alignment”? I think we have to say “yes it does”, because it leads to better reward predictions! But I don’t care. I still don’t want it. It’s bad bad bad. We need to figure out how to prevent that particular credit-assignment Thought Assessor update from happening.
[...]
I think there’s a broader lesson here. I think “outer alignment versus inner alignment” is an excellent starting point for thinking about the alignment problem. But that doesn’t mean we should expect one solution to outer alignment, and a different unrelated solution to inner alignment. Some things—particularly interpretability—cut through both outer and inner layers, creating a direct bridge from the designer’s intentions to the AGI’s goals. We should be eagerly searching for things like that.
Yeah, there definitely seems to be something off about that categorization. I've thought a bit about how this stuff works in humans, particularly in this post of my moral anti-realism sequence. To give some quotes from that:
One of many takeaways I got from reading Kaj Sotala’s multi-agent models of mind sequence (as well as comments by him) is that we can model people as pursuers of deep-seated needs. In particular, we have subsystems (or “subagents”) in our minds devoted to various needs-meeting strategies. The subsystems contribute behavioral strategies and responses to help maneuver us toward states where our brain predicts our needs will be satisfied. We can view many of our beliefs, emotional reactions, and even our self-concept/identity as part of this set of strategies. Like life plans, life goals are “merely” components of people’s needs-meeting machinery.[8]
Still, as far as components of needs-meeting machinery go, life goals are pretty unusual. Having life goals means to care about an objective enough to (do one’s best to) disentangle success on it from the reasons we adopted said objective in the first place. The objective takes on a life of its own, and the two aims (meeting one’s needs vs. progressing toward the objective) come apart. Having a life goal means having a particular kind of mental organization so that “we” – particularly the rational, planning parts of our brain – come to identify with the goal more so than with our human needs.[9]
[...]
There’s a normative component to something as mundane as choosing leisure activities. [E.g., going skiing in the cold, or spending the weekend cozily at home.] In the weekend example, I’m not just trying to assess the answer to empirical questions like “Which activity would contain fewer seconds of suffering/happiness” or “Which activity would provide me with lasting happy memories.” I probably already know the answer to those questions. What’s difficult about deciding is that some of my internal motivations conflict. For example, is it more important to be comfortable, or do I want to lead an active life? When I make up my mind in these dilemma situations, I tend to reframe my options until the decision seems straightforward. I know I’ve found the right decision when there’s no lingering fear that the currently-favored option wouldn’t be mine, no fear that I’m caving to social pressures or acting (too much) out of akrasia, impulsivity or some other perceived weakness of character.[21]We tend to have a lot of freedom in how we frame our decision options. We use this freedom, this reframing capacity, to become comfortable with the choices we are about to make. In case skiing wins out, then “warm and cozy” becomes “lazy and boring,” and “cold and tired” becomes “an opportunity to train resilience / apply Stoicism.” This reframing ability is a double-edged sword: it enables rationalizing, but it also allows us to stick to our beliefs and values when we’re facing temptations and other difficulties.
[...]
Visualizing the future with one life goal vs. anotherWhether a given motivational pull – such as the need for adventure, or (e.g.,) the desire to have children – is a bias or a fundamental value is not set in stone; it depends on our other motivational pulls and the overarching self-concept we’ve formed.
Lastly, we also use “planning mode” to choose between life goals. A life goal is a part of our identity – just like one’s career or lifestyle (but it’s even more serious).We can frame choosing between life goals as choosing between “My future with life goal A” and “My future with life goal B” (or “My future without a life goal”). (Note how this is relevantly similar to “My future on career path A” and “My future on career path B.”)
[...]
It’s important to note that choosing a life goal doesn’t necessarily mean that we predict ourselves to have the highest life satisfaction (let alone the most increased moment-to-moment well-being) with that life goal in the future. Instead, it means that we feel the most satisfied about the particular decision (to adopt the life goal) in the present, when we commit to the given plan, thinking about our future. Life goals inspired by moral considerations (e.g., altruism inspired by Peter Singer’s drowning child argument) are appealing despite their demandingness – they can provide a sense of purpose and responsibility.
So, it seems like we don't want "perfect inner alignment," at least not if inner alignment is about accurately predicting reward and then forming the plan of doing what gives you most reward. Also, there's a concept of "lock in" or "identifying more with the long-term planning part of your brain than with the underlying needs-meeting machinery." Lock in can be dangerous (if you lock in something that isn't automatically corrigible), but it might also be dangerous not to lock in anything (because this means you don't know what other goals form later on).
Idk, the whole thing seems to me like brewing a potion in Harry Potter, except that you don't have a recipe book and there's luck involved, too. "Outer alignment," a minimally sufficient degree thereof (as in: the agent tends to gets rewards when it takes actions towards the intended goal), increases the likelihood that you get broadly pointed you in the right direction, so the intended goal maybe gets considered among things the internal planner considers reinforcing itself around / orienting itself towards. But then, whether the intended gets picked over other alternatives (instrumental requirements for general intelligence, or alien motivations the AI might initially have), who knows. Like with raising a child, sometimes they turn out the way the parents intend, sometimes not at all. There's probably a science to finding out how outcomes become more likely, but even if we could do that with human children developing into adults with fixed identities, there's then still the question of how to find analogous patterns in (brain-like) AI. Tough job.
Conditioned Taste Aversion (CTA) is a phenomenon where, if I get nauseous right now, it causes an aversion to whatever tastes I was exposed to a few hours earlier—not a few seconds earlier, not a few days earlier, just a few hours earlier. (I alluded to CTA above, but not its timing aspect.) The evolutionary reason for this is straightforward: a few hours is presumably how long it typically takes for a toxic food to induce nausea.
That explains why my brother no longer likes mushrooms. When we were little, he liked them and we ate mushrooms at a restaurant, then were driven through curvy mountain roads later that day with the family. He got car sick and vomited, and afterwards he had an intense hatred for mushrooms.
Is that sort of configuration even biologically possible (or realistic)? I have no deep immunology understanding, but I think bad reactions to vaccines have little to nothing to do with whether you're up-to-date on previous vaccines. So far, I'm not sure we're good at predicting who reacts with more severe side effects than average (and if we did, it's not like it's easy to tweak the vaccine, except for tradeoff-y things like lowering the vaccination dose).
My point is that I have no evidence that he ended up reading most of the relevant posts in their entirety. I don't think people who read all the posts in their entirety should just go ahead and unilaterally dox discussion participants, but I feel like people who have only read parts of it (or only secondhand sources) should do it even less.
Also, at the time, I interpreted Roko's "request for a summary" more as a way for him to sneer at people. His "summary" had a lot of loaded terms and subjective judgments in it. Maybe this is a style thing, but I find that people should only (at most) write summaries like that if they're already well-informed. (E.g., Zvi's writing style can be like that, and I find it fine because he's usually really well-informed. But if I see him make a half-assed take on something he doesn't seem to be well-informed on, I'd downvote.)
See my comment here.
Kat and Emerson were well-known in the community and they were accused of something that would cause future harm to EA community members as well. By contrast, Chloe isn't particularly likely to make future false allegations even based on Nonlinear's portrayal (I would say). It's different for Alice, since Nonlinear claim she has a pattern. (But with Alice, we'd at least want someone to talk to Nonlinear in private and verify how reliable they seem about negative info they have about Alice, before simply taking their word for it based on an ominous list of redacted names and redacted specifics of accusations.)
Theoretically Ben could have titled his post, "Sharing Information About [Pseudonymous EA Organization]", and requested the mods enforce anonymity of both parties, right?
That would miss the point, rendering the post almost useless. The whole point is to prevent future harm.
but not for Roko to unilaterally reveal the names of Alice and Chloe?
Alice and Chloe had Ben, who is a trusted community member, look into their claims. I'd say Ben is at least somewhat "on the hook" for the reliability of the anonymous claims.
By contrast, Roko posted a 100 word summary of the Nonlinear incident that got some large number of net downvotes, so he seems to be particularly poorly informed about what even happened.
Some conditions for when I think it's appropriate for an anonymous source to make a critical post about a named someone on the forum:
- Is the accused a public person or do they run an organization in the EA or rationality ecosystem?
- Or: Is the type of harm the person is accused of something that the community benefits from knowing?
- Did someone who is non-anonymous and trusted in the community talk to the anonymous accuser and verify claims and (to some degree*) stake their reputation for them?
*I think there should be a role of "investigative reporter:" someone verifies that the anonymous person is not obviously unreliable. I don't think the investigative reporter is 100% on the hook for anything that will turn out to be false or misleading, but they are on the hook for things like doing a poor job at verifying claims or making sure there aren't any red flags about a person.
(It's possible for anonymous voices to make claims without the help of an "investigative reporter;" however, in that case, I think the appropriate community reaction should be to give little-to-no credence to such accusations. After all, they could be made by someone who already has had their reputation justifiably tarnished.)
On de-anonymizing someone (and preventing an unfair first-mover advantage):
- In situations where the accused parties are famous and have lots of influence, we can view anonymity protection as evening the playing field rather than conferring an unfair advantage. (After all, famous and influential people already have a lot of advantages on their side – think of Sam Altman in the conflict with the OpenAI board.)
- If some whistleblower displays a pattern/history of making false accusations, that implies potential for future harm, so it seems potentially appropriate to warn others about them (but you'd still want to be cautious, take your time to evaluate evidence carefully, and not fall prey to a smear campaign by the accused parties – see DARVO).
- If there's no pattern/history of false accusations, but the claims by a whistleblower turn out to be misleading in more ways than one would normally expect in the heat of things (but not egregiously so), then the situation is going to be unsatisfying, but personally I'd err on the side of protecting anonymity. (I think this case is strongest the more the accused parties are more powerful/influential than the accusers.) I'd definitely protect anonymity if the accusations continue to seem plausible but are impossible to prove/there remains lots of uncertainty.
- I think de-anonymization, if it makes sense under some circumstances, should only be done after careful investigation, and never "in the heat of the movement." In conflicts that are fought publicly, it's very common for different sides to gain momentum temporarily but then lose it again, depending on who had the last word.
Very thoughtful post. I liked that you delved into this out of interest even though you aren't particularly involved in this community, but then instead of just treating it as fun but unproductive gossip, you used your interest to make a high-value contribution!
It changed my mind in some places (I had a favorable reaction to the initial post by Ben; also, I still appreciate what Ben tried to do).
I will comment on two points that I didn't like, but I'm not sure to what degree this changes your recommended takeaways (more on this below).
They [Kat and Emerson] made a major unforced tactical error in taking so long to respond and another in not writing in the right sort of measured, precise tone that would have allowed them to defuse many criticisms.
I don't like that this sounds like this is only (or mostly) about tone.
I updated that the lawsuit threat was indeed more about tone than I initially thought. I initially thought that any threat of a lawsuit is strong evidence that someone is a bad actor. I now think it's sometimes okay to mention the last resort of lawsuits if you think you're about to be defamed.
At the same time, I'd say it was hard for Lightcone to come away with that interpretation when Emerson used phrases like 'maximum damages permitted by law' (a phrasing optimized for intimidation). Emerson did so in the context where one of the things he was accused of was unusually hostile negotiation and intimidation tactics! So, given the context and "tone" of the lawsuit threat, I feel like it made a lot of sense for Lightcone to see their worst concerns about Emerson "confirmation-boosted" when he made that lawsuit threat.
In any case, and more to my point about tone vs other things, I want to speak about the newer update by Nonlinear that came three months after the original post by Ben. Criticizing tone there is like saying "they lack expert skills at defusing tensions; not ideal, but also let's not be pedantic." It makes it sound like all they need to become great bosses is a bit of tactfulness training. However, I think there are more fundamental things to improve on, and these things lend a bunch of credibility to why someone might have a bad time working with them. (Also, they had three months to write that post, and it's really quite optimized for presentation in several ways, so it's not like we should apply low standards for this post.) I criticized some aspects of their post here and here. In short, I feel like they reacted by (1) conceding little to no things they could have done differently and (2), going on the attack with outlier-y black-and-white framings against not just Alice, but also Chloe, in a way that I think is probably more unfair/misleading/uncharitable about Chloe than what Chloe said about them. (I say "probably" because I didn't spend a lot of time re-reading Ben's original post and trying to separate which claims were made by Alice vs Chloe, doing the same about Nonlinear's reply, and filtering out whether they're ascribing statements to Chloe with their quotes-that-aren't-quotes that she didn't actually say.) I think that's a big deal because their reaction pattern-matches to how someone would react if they did indeed have a "malefactor" pattern of frequently causing interpersonal harm. Just like it's not okay to make misleading statements about others solely because you struggled with negative emotions in their presence, it's also (equally) not okay to make misleading statements solely because someone is accusing you of being a bad boss or leader. It can be okay to see red in the heat of battle, but it's an unfortunate dynamic because it blurs the line between people who are merely angry and hurt and people who are character-wise incapable of reacting appropriately to appropriate criticism. (This also goes into the topic of "adversarial epistemology" – if you think the existence of bad actors is a sufficient problem, you want to create social pressure for good-but-misguided actors to get their shit together and stop acting in a way/pattern that lends cover to bad actors.)
Eliezer recently re-tweeted this dismissive statement about DARVO. I think this misses the point. Sure, if the person who accuses you is a malicious liar or deluded to a point where it has massively destructive effects and is a pattern, then, yeah, you're forced to fight back. So, point taken: sometimes the person who appears like the victim initially isn't actually the victim. However, other times the truth is at least somewhat towards the middle, i.e., the person accusing you of something may have some points. In that case, you can address what happened without character-assassinating them in return, especially if you feel like you had a lot of responsibility in them having had a bad time. Defending Alice is not the hill I want to die on (although I'm not saying I completely trust Nonlinear's picture of her), but I really don't like the turn things took towards Chloe. I feel like it's messed up that several commenters (at one point my comment here had 9 votes and -5 overall karma, and high disagreement votes) came away with the impression that it might be appropriate to issue a community-wide warning about Chloe as someone with a pattern of being destructive (and de-anonymizing her, which would further send the signal that the community considers her a toxic person). I find that a really scary outcome for whistleblower norms in the community. Note that this isn't because I think it's never appropriate to de-anonymize someone.
Here are the list of values that are important to me about this whole affair and context:
- I want whistleblower-type stuff to come to light because I think the damage bad leaders can do is often very large
- I want investigations to be fair. In many cases, this means giving accused parties time to respond
- I understand that there’s a phenotype of personality where someone has a habit of bad-talking others through false/misleading/distorted claims, and I think investigations (and analysis) should be aware of that
(FWIW, I assume that most people who vehemently disagree with me about some of the things I say in this comment and elsewhere would still endorse these above values.)
So, again, I'm not saying I find this a scary outcome because I have a "always believe the victim" mentality. (Your post fortunately doesn't strawman others like that, but there were comments on Twitter and facebook that pushed this point, which I thought was uncalled for.)
Instead, consider for a moment the world where I'm right that:
- Chloe isn't a large outlier in any relevant way of personality, except perhaps she was significantly below average at standing up for her interests/voicing her boundaries (for which it might even be possible that it was selected for in the Nonlinear hiring process)
This is what I find most plausible based on a number of data points. In that world, I think something about the swing of the social pendulum went wrong when the result of Chloe sharing her concerns makes things worse for her. (I'm not saying this is currently the case – I'm saying it would be the case if we fully bought into Nonlinear's framing or the people who make the most negative comments about both Chloe and Alice, without flagging that many people familiar with the issue thought that Alice was a less reliable narrator than Chloe, etc.)
Of course, I focused a lot on a person who is currently anonymized. Fair to say that this is unfair given that Nonlinear have their reputation at stake all out in the open. Like I said elsewhere, it's not like I think they deserved the full force of this.
These are tough tradeoffs to make. Unfortunately, we need some sort of policy to react to people who might be bad leaders. Among all the criticisms about Ben's specific procedure, I don't want this part to be de-emphasized.
The community mishandled this so badly and so comprehensively that inasmuch as Nonlinear made mistakes in their treatment of Chloe or Alice, for the purposes of the EA/LW community, the procedural defects have destroyed the case.
I'm curious what you mean by the clause "for the purposes of the EA/LW community." I don't want to put words into your mouth, but I'd be sympathetic to a claim that goes as follows. From a purely procedural perspective about what a fair process should look like for a community to decide that a particular group should be cut out from the community's talent pipeline (or whatever harsh measure people want to consider), it would be unfair to draw this sort of conclusion against Nonlinear based on the too many flaws in the process used. If that's what you're saying, I'm sympathetic to that at the very least in the sense of "seems like a defensible view to me." (And maybe also overall – but I find it hard to think about this stuff and I'm a bit tired of the affair.)
At the same time, I feel like, as a private individual, it's okay to come away with confident beliefs (one way or the other) from this whole thing. It takes a higher bar of evidence (and assured fairness of procedure) to decide "the community should act as though x is established consensus" than it takes to yourself believe x.
An organization gets applications from all kinds of people at once, whereas an individual can only ever work at one org. It's easier to discreetly contact most of the most relevant parties about some individual than it is to do the same with an organization.
I also think it's fair to hold orgs that recruit within the EA or rationalist communities to slightly higher standards because they benefit directly from association with these communities.
That said, I agree with habryka (and others) that
I think if the accusations are very thoroughly falsified and shown to be highly deceptive in their presentation, I can also imagine some scenarios where it might make sense to stop anonymizing, though I think the bar for that does seem pretty high.
a) A lot of your points are specifically about Altman and the board, whereas many of my points started that way but then went into the abstract/hypothetical/philosophical. At least, that's how I meant it – I should have made this more clear. I was assuming, for the sake of the argument, that we're speaking of a situation where the person in the board's position found out that someone else is deceptive to their very core, with no redeeming principles they adhere to. So, basically what you're describing in your point "I" with the lizardpeople. I focused on that type of discussion because I felt like you were attacking my principles, and I care about defending my specific framework of integrity. (I've commented elsewhere on things that I think the board should or shouldn't have done, so I also care about that, but I probably already spent too many comments on speculations about the board's actions.)
Specifically about the actual situation with Altman, you say:
"I'm saying that you should honor the agreement you've made to wield your power well and not cruelly or destructively. It seems to me that it has likely been wielded very aggressively and in a way where I cannot tell that it was done justly."
I very much agree with that, fwiw. I think it's very possible that the board did not act with integrity here. I'm just saying that I can totally see circumstances where they did act with integrity. The crux for me is "what did they believe about Altman and how confident were they in their take, and did they make an effort to factor in moral injunctions against using their power in a self-serving way, etc?"
b) You make it seem like I'm saying that it's okay to move against people (and e.g. oust them) without justifying yourself later or giving them the chance to reply at some point later when they're in a less threatening position. I think we're on the same page about this: I don't believe that it would be okay to do these things. I wasn't saying that you don't have to stand answer to what you did. I was just saying that it can, under some circumstances, be okay to act first and then explain yourself to others later and establish yourself as still being trustworthy.
c) About your first point (point "I"), I disagree. I think you're too deontological here. Numbers do count. Being unfair to someone who you think is a bad actor but turns out they aren't has a victim count of one. Letting a bad actor take over the startup/community/world you care about has a victim count of way more than one. I also think it can be absolutely shocking how high this can go (in terms of various types of harms caused by the bad tail of bad actors) depending on the situation. E.g., think of Epstein or dictators. On top of that, there are indirect bad effects that don't quite fit the name "victim count" but that still weigh heavily, such as distorted epistemics or destruction of a high-trust environment when it gets invaded by bad actors. Concretely, I feel like when you talk about the importance of the variable "respect towards Altman in the context of how much notice to give him," I'm mostly thinking, sure, it would be nice to be friendly and respectful, but that's a small issue compared to considerations like "if the board is correct, how much could he mobilize opposition against them if he had a longer notice period?" So, I thought three months notice would be inappropriate given what's asymmetrically at stake on both sides of the equation. (It might change once we factor in optics and how it'll be easier for Altman to mobilize opposition if he can say he was treated unfairly – for some reason, this always works wonders. DARVO is like dark magic. Sure, it sucks for Altman to lose a 100 billion company that they built. But an out-of-control CEO recklessly building the most dangerous tech in the world sucks more for way more people in expectation.) In the abstract, I think it would be an unfair and inappropriate sense of what matters if a single person who is accused of being a bad actor gets more respect than their many victims would suffer in expectation. And I'm annoyed that it feels like you took the moral high ground here by making it seem like my positions are immoral. But maybe you meant the "shame on yourself" for just one isolated sentence, and not my stance as a whole. I'd find that more reasonable. In any case, I understand now that you probably feel bothered for an analogous reason, namely that I made a remark about how it's naive to be highly charitable or cooperative under circumstances where I think it's no longer appropriate. I want to flag that nothing you wrote in your newest reply seems naive to me, even though I do find it misguided. (The only thing that I thought was maybe naive was the point about three months notice – though I get why you made it and I generally really appreciate examples like that about concrete things the board could have done. I just think it would backfire when someone would use these months to make moves against you.)
d) The "shame on yourself" referred to something where you perceived me to be tribal, but I don't really get what that was about. You write "and (c) kind of saying your tribe is the only one with good people in it." This is not at all what I was kind of saying. I was saying my tribe is the only one with people who are "naive in such-and-such specific way" in it, and yeah, that was unfair towards EAs, but then it's not tribal (I self-identify as EA), and I feel like it's okay to use hyperbole this way sometimes to point at something that I perceive to be a bit of a problem in my tribe. In any case, it's weirdly distorting things when you then accuse me of something that only makes sense if you import your frame on what I said. I didn't think of this as being a virtue, so I wasn't claiming that other communities don't also have good people.
e) Your point "III" reminds me of this essay by Eliezer titled "Meta-Honesty: Firming Up Honesty Around Its Edge-Cases." Just like Eliezer in that essay explains that there are circumstances where he thinks you can hide info or even deceive, there are circumstances where I think you can move against someone and oust them without advance notice. If a prospective CEO interviews me as a board member, I'm happy to tell them exactly under which circumstances I would give them advance notice (or things like second and third chances) and under which ones I wouldn't. (This is what reminded me of the essay and the dialogues with the Gestapo officer.) (That said, I'd decline the role because I'd probably have overdosed on anxiety medication if I had been in the OpenAI board's position.)
The circumstances would have to be fairly extreme for me not to give advanced warnings or second chances, so if a CEO thinks I'm the sort of person who doesn't have a habit of interpreting lots of things in a black-and-white and uncharitable manner, then they wouldn't have anything to fear if they're planning on behaving well and are at least minimally skilled at trust-building/making themselves/their motives/reasons for actions transparent.
f) You say:
"I think it is damaging to the trust people place in board members, to see them act with so little respect or honor. It reduces everyone's faith in one another to see people in powerful positions behave badly."
I agree that it's damaging, but the way I see it, the problem here is the existence of psychopaths and other types of "bad actors" (or "malefactors"). They are why issues around trust and trustworthiness are sometimes so vexed and complicated. It would be wonderful if such phenotypes didn't exist, but we have to face reality. It doesn't actually help "the social fabric/fabric of trust" if one lends too much trust to people who abuse it to harm others and add more deception. On the contrary, it makes things worse.
g) I appreciate what you say in the first paragraph of your point IV! I feel the same way about this. (I should probably have said this earlier in my reply, but I'm about to go to sleep and so don't want to re-alphabetize all of the points.)
When I make an agreement to work closely with you on a crucial project,
I agree that there are versions of "agreeing to work closely together on the crucial project" where I see this as "speak up now or otherwise allow this person into your circle of trust." Once someone is in that circle, you cannot kick them out without notice just because you think you observed stuff that made you change your mind – if you could do that, it wouldn't work as a circle of trust.
So, there are circumstances where I'd agree with you. Whether the relationship between a board member and a CEO should be like that could be our crux here. I'd say yes in the ideal, but was it like that for the members of the board and Altman? I'd say it depends on the specific history. And my guess is that, no, there was no point where the board could have said "actually we're not yet sure we want to let Altman into our circle of trust, let's hold off on that." And there's no yes without the possibility of no.
if I think you're deceiving me, I will let you know.
I agree that one needs to do this if one lost faith in people who once made it into one's circle of trust. However, let's assume they were never there to begin with. Then, it's highly unwise if you're dealing with someone without morals who feels zero obligations towards you in response. Don't give them an advance warning out of respect or a sense of moral obligation. If your mental model of the person is "this person will internally laugh at you for being stupid enough to give them advance warning and will gladly use the info you gave against you," then it would be foolish to tell them. Batman shouldn't tell the joker that he's coming for him.
I may move quickly to disable you if it's an especially extreme circumstance but I will acknowledge that this is a cost to our general cooperative norms where people are given space to respond even if I assign a decent chance to them behaving poorly.
What I meant to say in my initial comment is the same thing as you're saying here.
"Acknowledging the cost" is also an important thing in how I think about it, (edit) but I see that cost as not being towards the Joker (respect towards him), but towards the broader cooperative fabric. [Edit: deleted a passage here because it was long-winded.]
"If I assign a decent chance to them behaving poorly" – note that in my description, I spoke of practical certainty, not just "a decent chance that." Even in contexts where I think mutual expectations of trustworthiness and cooperativeness are lower than in what I call "circles of trust," I'm all in favor of preserving respect up until way past the point where you're just a bit suspicious of someone. It's just that, if the stakes are high and if you're not in a high-trust relationship with the person (i.e., you don't have a high prior that they're for sure cooperating back with you), there has to come a point where you'll stop giving them free information that could harm you.
I admit this is a step in the direction of act utilitarianism, and act utilitarianism is a terrible, wrong ideology. However, I think it's only a step and not all the way, and there's IMO a way to codify rules/virtues where it's okay to take these steps and you don't get into a slippery slope. We can have a moral injunction where we'd only make such moves against other people if our confidence is significantly higher than it needs to be on mere act utilitarian grounds. Basically, you either need smoking-gun evidence of something sufficiently extreme, or need to get counsel from other people and see if they agree to filter out unilateralism in your judgment, or have other solutions/safety-checks like that before allowing yourself to act.
I think what further complicates the issue is that there are "malefactor types" who are genuinely concerned about doing the right thing and where it looks like they're capable of cooperating with people in their inner circle, but then they are too ready to make huge rationalization-induced updates (almost like "splitting" in BPD) that the other party was bad throughout all along and is now out of the circle. Their inner circle is way too fluid and their true circle of trust is only themselves. The existence of this phenotype means that if someone like that tries to follow the norms I just advocated, they will do harm. How do I incorporate this into my suggested policy? I feel like this is analogous to discussions about modest epistemology vs non-modest epistemology. What if you're someone who's deluded to think he's Napolean/some genius scientist? If someone is deluded like that, non-modest epistemology doesn't work. To this, I say "epistemology is only helpful if you're not already hopelessly deluded." Likewise, what if your psychology is hopelessly self-deceiving and you'll do on-net-harmful self-serving things even when you try your best not to do them? Well, sucks to be you (or rather, sucks for other people that you exist), but that doesn't mean that the people with a more trust-compatible psychology have to change the way they go about building a fabric of trust that importantly also has to be protected against invasion from malefactors.
I actually think it's a defensible position to say that the temptation to decide who is or isn't "trustworthy" is too big and humans need moral injunctions and that batman should give the joker an advance warning, so I'm not saying you're obviously wrong here, but I think my view is defensible as well, and I like it better than yours and I'll keep acting in accordance with it. (One reason I like it better is because if I trust you and you play "cooperate" with someone who only knows deception and who moves against you and your cooperation partners and destroys a ton of value, then I shouldn't have trusted you either. Being too undiscriminately cooperative makes you less trustworthy in a different sort of way.)
Shame on you for suggesting only your tribe knows or cares about honoring partnerships with people after you've lost trust in them. Other people know what's decent too.
I think there's something off about the way you express whatever you meant to express here – something about how you're importing your frame of things over mine and claim that I said something in the language of your frame, which makes it seem more obviously bad/"shameful" than if you expressed it under my frame.
[Edit: November 22nd, 20:46 UK time. Oh, I get it now. You totally misunderstood what I meant here! I was criticizing EAs for doing this too naively. I was not praising the norms of my in-group (EA). Your reply actually confused me so much that I thought you were being snarky at me in some really strange way. Like, I thought you knew I was criticizing EAs. I guess you might identify as more of a rationalist than an EA, so I should have said "only EAs and rationalists" to avoid confusion. And like I say below, this was somewhat hyperbolic.]
In any case, I'd understand it if you said something like "shame on your for disclosing to the world that you think of trust in a way that makes you less trustworthy (according to my, Ben's, interpretation)." If that's what you had said, I'm now replying that I hope that you no longer think this after reading what I elaborated above.
Edit: And to address the part about "your tribe" – okay, I was being hyperbolic about only EAs having a tendency to be (what-I-consider-to-be) naive when it comes to applying norms of cooperation. It's probably also common in other high-trust ideological communities. I think it actually isn't very common in Silicon Valley, which very much supports my point here. When people get fired or backstabbed over startup drama (I'm thinking of the movie The Social Network), they are not given three months adjustment period where nothing really changes except that they now know what's coming. Instead, they have their privileges revoked and passwords changed and have to leave the building. I think focusing over how much notice someone has given is more a part of the power struggle and war over who has enough leverage to get others on their side, than it is genuinely about "this particular violation of niceness norms is so important that it deserves to be such a strong focus of this debate." Correspondingly, I think people would complain a lot less about how much notice was given if the board had done a better job convincing others that their concerns were fully justified. (Also, Altman himself certainly wasn't going to give Helen a lot of time still staying on the board and adjusting to the upcoming change, still talking to others about her views and participating in board stuff, etc., when he initially thought he could get rid of her.)
Maybe, yeah. Definitely strongly agree with not telling the staff a more complete story seems to be bad for both intrinsic and instrumental reasons.
I'm a bit unsure how wise it would be to tip Altman off in advance given what we've seen he can mobilize in support of himself.
And I think it's a thing that only EAs would think up that it's valuable to be cooperative towards people who you're convinced are deceptive/lack integrity. [Edit: You totally misunderstood what I meant here; I was criticizing them for doing this too naively. I was not praising the norms of my in-group. Your reply actually confused me so much that I thought you were being snarky in some really strange way.] Of course, they have to consider all the instrumental reasons for it, such as how it'll reflect on them if others don't share their assessment of the CEO lacking integrity.
Hm, to add a bit more nuance, I think it's okay at a normal startup for a board to be comprised of people who are likely to almost always side with the CEO, as long as they are independent thinkers who could vote against the CEO if the CEO goes off the rails. So, it's understandable (or even good/necessary) for CEOs to care a lot about having "aligned" people on the board, as long as they don't just add people who never think for themselves.
It gets more complex in OpenAI's situation where there's more potential for tensions between CEO and the board. I mean, there shouldn't necessarily be any tensions, but Altman probably had less of a say over who the original board members were than a normal CEO at a normal startup, and some degree of "norms-compliant maneuvering" to retain board control feels understandable because any good CEO cares a great deal about how to run things. So, it actually gets a bit murky and has to be judged case-by-case. (E.g., I'm sure Altman feels like what happened vindicated him wanting to push Helen off the board.)
Yeah, that makes sense and does explain most things, except that if I was Helen, I don't currently see why I wouldn't have just explained that part of the story early on?* Even so, I still think this sounds very plausible as part of the story.
*Maybe I'm wrong about how people would react to that sort of justification. Personally, I think the CEO messing with the board constitution to gain de facto ultimate power is clearly very bad and any good board needs to prevent that. I also believe that it's not a reason to remove a board member if they publish a piece of research that's critical of or indirectly harmful for your company. (Caveat that we're only reading a secondhand account of this, and maybe what actually happened would make Altman's reaction seem more understandable.)
One thing I've realized more in the last 24h:
- It looks like Sam Altman is using a bunch of "tricks" now trying to fight his way back into more influence over OpenAI. I'm not aware of anything I'd consider unethical (at least if one has good reasons to believe one has been unfairly attacked), but it's still the sort of stuff that wouldn't come naturally to a lot of people and wouldn't feel fair to a lot of people (at least if there's a strong possibility that the other side is acting in good faith too).
- Many OpenAI employees have large monetary incentives on the line and there's levels of peer pressure that are off the charts, so we really can't read too much into who tweeted how many hearts or signed the letter or whatever.
Maybe the extent of this was obvious to most others, but for me, while I was aware that this was going on, I feel like I underestimated the extent of it. One thing that put things into a different light for me is this tweet.
Which makes me wonder, could things really have gone down a lot differently? Sure, smoking-gun-type evidence would've helped the board immensely. But is it their fault that they don't have it? Not necessarily. If they had (1) time pressure (for one reason or another – hard to know at this point) and (2) if they still had enough 'soft' evidence to justify drastic actions. With (1) and (2) together, it could have made sense to risk intervening even without smoking-gun-type evidence.
(2) might be a crux for some people, but I believe that there are situations where it's legitimate for a group of people to become convinced that someone else is untrustworthy without being in a position to easily and quickly convince others. NDAs in play could be one reason, but also just "the evidence is of the sort that 'you had to be there'" or "you need all this other context and individual data points only become compelling if you also know about all these other data points that together help rule out innocuous/charitable interpretations about what happened."
In any case, many people highlighted the short notice with which the board announced their decision and commented that this implies that the board acted in an outrageous way and seems inexperienced. However, having seen what Altman managed to mobilize in just a couple of days, it's now obvious that, if you think he's scheming and deceptive in a genuinely bad way (as opposed to "someone knows how to fight power struggles and is willing to fight them when he feels like he's undeservedly under attack" – which isn't by itself a bad thing), then you simply can't give him a headstart.
So, while I still think the board made mistakes, I today feel a bit less confident that these mistakes were necessarily as big as I initially thought. I now think it's possible – but far from certain – that we're in a world where things are playing out the way they have mostly because it's a really though situation for the board to be in even when they are right. And, sure, that would've been a reason to consider not starting this whole thing, but obviously that's very costly as well, so, again, tough situation.
I guess a big crux is "how common is it that you justifiably think someone is bad but it'll be hard to convince others?" My stance is that, if you're right, you should eventually be able to convince others if the others are interested in the truth and you get a bunch of time and the opportunity to talk to more people who may have extra info. But you might not be able to succeed if you only have a few days and then you're out if you don't sound convincing enough.
My opinions have been fluctuating a crazy amount recently (I don't think I've ever been in a situation where my opinions have gone up and down like this!), so, idk, I may update quite a bit in the other direction again tomorrow.
Having a "plan A" requires detailed advance-planning. I think it's much more likely that their decision was reactive rather than plan-based. They felt strongly that Altman had to go based on stuff that happened, and so they followed procedures – appoint an interim CEO and do a standard CEO search. Of course, it's plausible – I'd even say likely – that an "Anthropic merger" was on their mind as something that could happen as a result of this further down the line. But I doubt (and hope not) that this thought made a difference to their decision.
Reasoning:
- If they had a detailed plan that was motivating their actions (as opposed to reacting to a new development and figuring out what to do as things go on), they would probably have put in a bit more time gathering more potentially incriminating evidence or trying to form social alliances.
For instance, even just, in the months or weeks before, visiting OpenAI and saying hi to employees, introducing themselves as the board, etc., would probably have improved staff's perception of how this went down. Similarly, gathering more evidence by, e.g., talking to people close to Altman but sympathetic to safety concerns, asking whether they feel heard in the company, etc, could have unearthed more ammunition. (It's interesting that even the safety-minded researchers at OpenAI basically sided with Altman here, or, at the very least, none of them came to the board's help speaking up against Altman on similar counts. [Though I guess it's hard to speak up "on similar counts" if people don't even really know their primary concerns apart from the vague "not always candid."]) - If the thought of an Anthropic merge did play a large role in their decision-making (in the sense of "making the difference" to whether they act on something across many otherwise-similar counterfactuals), that would constitute a bad kind of scheming/plotting. People who scheme like that are probably less likely than baseline to underestimate power politics and the difficulty of ousting a charismatic leader, and more likely than baseline to prepare well for the fight. Like, if you think your actions are perfectly justified per your role as board member (i.e., if you see yourself as acting as a good board member), that's exactly the situation in which you're most likely to overlook the possibility that Altman may just go "fuck the board!" and ignore your claim to legitimacy. By contrast, if you're kind of aware that you're scheming and using the fact that you're a board member merely opportunistically, it might more readily cross your mind that Altman might scheme back at you and use the fact that he knows everyone at the company and has a great reputation in the Valley at large.
- It seems like the story feels overall more coherent if the board perceived themselves to be acting under some sort of time-pressure (I put maybe 75% on this).
- Maybe they felt really anxious or uncomfortable with the 'knowledge' or 'near-certainty' (as it must have felt to them, if they were acting as good board members) that Altman is a bad leader, so they sped things up because it was psychologically straining to deal with the uncertain situation.
- Maybe Altman approaching investors made them worry that if he succeeds, he'd acquire too much leverage.
- Maybe Ilya approached them with something and prompted them to react to it and do something, and in the heat of the moment, they didn't realize that it might be wise to pause and think things through and see if Ilya's mood is a stable one.
- Maybe there was a capabilities breakthrough and the board and Ilya were worried the new system may not be safe enough especially considering that once the weights leak, people anywhere on the internet can tinker with the thing and improve it with tweaks and tricks.
- [Many other possibilities I'm not thinking of.]
- [Update – I posted this update before gwern's comment but didn't realize it's that it's waaay more likely to be the case than the other ones before he said it] I read a rumor in a new article about talks about how to replace another board member, so maybe there was time pressure before Altman and Brockman would appoint a new board member who would always side with them.
were surprised when he rejected them
I feel like you're not really putting yourself into the shoes of the board members if you think they were surprised by the time where they asked around for CEOs that someone like Dario (with the reputation of his entire company at risk) would reject them. At that point, the whole situation was such a mess that they must have felt extremely bad and desperate going around frantically asking for someone to come in and help save the day. (But probably you just phrased it like that because you suspect that, in their initial plan where Altman just accepts defeat, their replacement CEO search would go over smoothly. That makes sense to me conditional on them having formed such a detailed-but-naive "plan A.")
Edit: I feel confident in my stance but not massively so, so I reserve maybe 14% to a hypothesis that is more like the one you suggested, partly updating towards habryka's cynicism, which I unfortunately think has had a somewhat good track record recently.
Yeah but if this is the case, I'd have liked to see a bit more balance than just retweeting the tribal-affiliation slogan ("OpenAI is nothing without its people") and saying that the board should resign (or, in Ilya's case, implying that he regrets and denounces everything he initially stood for together with the board). Like, I think it's a defensible take to think that the board should resign after how things went down, but the board was probably pointing to some real concerns that won't get addressed at all if the pendulum now swings way too much in the opposite direction, so I would have at least hoped for something like "the board should resign, but here are some things that I think they had a point about, which I'd like to see to not get shrugged under the carpet after the counter-revolution."
It was anyway weird that they had LeCun in charge and a thing called "Responsible AI team" in the same company. No matter what one thinks about Sam Altman now, compared to LeCun, the things he said about AI risks sounded 100 times more reasonable.
Okay, that's fair.
FWIW, I think it's likely that they thought about this decision for quite some time and systematically – I mean the initial announcement did mention something about a "deliberative review process by the board." But yeah, we don't get to see any of what they thought about or who (if anyone) they consulted for gathering further evidence or for verifying claims by Sutskever. Unfortunately, we don't know yet. And I concede that given the little info we have, it takes charitable priors to end up with "my view." (I put it in quotation marks because it's not like I have more than 50% confidence in it. Mostly, I want to flag that this view is still very much on the table.)
Also, on the part about "imply that Sam had done some pretty serious deception, without anything to back that up with." I'm >75% that either Eliezer nailed it in this tweet, or they actually have evidence about something pretty serious but decided not to disclose it for reasons that have to do with the nature of the thing that happened. (I guess the third option is they self-deceived into thinking their reasons to fire Altman will seem serious/compelling [or at least defensible] to everyone to whom they give more info, when in fact the reasoning is more subtle/subjective/depends on additional assumptions that many others wouldn't share. This could then have become apparent to them when they had to explain their reasoning to OpenAI staff later on, and they aborted the attempt in the middle of it when they noticed it wasn't hitting well, leaving the other party confused. I don't think that would necessarily imply anything bad about the board members' character, though it is worth noting that if someone self-deceives in that way too strongly or too often, it makes for a common malefactor pattern, and obviously it wouldn't reflect well on their judgment in this specific instance. One reason I consider this hypothesis less likely than the others is because it's rare for several people – the four board members – to all make the same mistake about whether their reasoning will seem compelling to others, and for none of them to realize that it's better to err on the side of caution and instead say something like "we noticed we have strong differences in vision with Sam Altman," or something like that.)
[...] reputational trade (OpenAI got to hire a bunch of talent from EA spaces and make themselves look responsible to the world) [...]
Yes, I think "reputational trade," i.e., something that's beneficial for both parties, is an important part of the story that the media hasn't really picked up on. EAs were focused on the dangers and benefits from AI way before anyone else, so it carries quite some weight when EA opinion leaders put an implicit seal of approval on the new AI company.
There's a tension between
(1) previously having held back on natural-seeming criticism of OpenAI ("putting the world at risk for profits" or "they plan on wielding this immense power of building god/single-handedly starting something bigger than the next Industrial Revolution/making all jobs obsolete and solving all major problems") because they have the seal of approval from this public good, non-profit, beneficial-mission-focused board structure,
and
(2) being outraged when this board structure does something that it was arguably intended to do (at least under some circumstances).
(Of course, the specifics of how and why things happened matter a lot, and maybe most people aren't outraged because the board did something, but rather because of how they did it or based on skepticism about reasons and justifications. On those later points, I sympathize more with people who are outraged or concerned that something didn't go right. But we don't know all the details yet.)