Posts

Lighthaven Sequences Reading Group #11 (Tuesday 11/19) 2024-11-13T05:33:07.928Z
Lighthaven Sequences Reading Group #10 (Tuesday 11/12) 2024-11-06T03:43:11.314Z
Lighthaven Sequences Reading Group #9 (Tuesday 11/05) 2024-10-31T21:34:15.000Z
Lighthaven Sequences Reading Group #8 (Tuesday 10/29) 2024-10-27T23:55:08.351Z
Lighthaven Sequences Reading Group #7 (Tuesday 10/22) 2024-10-16T05:02:18.491Z
Lighthaven Sequences Reading Group #6 (Tuesday 10/15) 2024-10-10T20:34:10.548Z
Lighthaven Sequences Reading Group #5 (Tuesday 10/08) 2024-10-02T02:57:58.908Z
2024 Petrov Day Retrospective 2024-09-28T21:30:14.952Z
Petrov Day Ceremony (TODAY) 2024-09-26T08:34:06.965Z
[Completed] The 2024 Petrov Day Scenario 2024-09-26T08:08:32.495Z
Lighthaven Sequences Reading Group #4 (Tuesday 10/01) 2024-09-25T05:48:00.099Z
Lighthaven Sequences Reading Group #3 (Tuesday 09/24) 2024-09-22T02:24:55.613Z
Lighthaven Sequences Reading Group #2 (Tuesday 09/17) 2024-09-08T21:23:27.490Z
First Lighthaven Sequences Reading Group 2024-08-28T04:56:53.432Z
Thiel on AI & Racing with China 2024-08-20T03:19:18.966Z
Extended Interview with Zhukeepa on Religion 2024-08-18T03:19:05.625Z
Debate: Is it ethical to work at AI capabilities companies? 2024-08-14T00:18:38.846Z
Debate: Get a college degree? 2024-08-12T22:23:34.744Z
LessOnline Festival Updates Thread 2024-04-18T21:55:08.003Z
LessOnline (May 31—June 2, Berkeley, CA) 2024-03-26T02:34:00.000Z
Vote on Anthropic Topics to Discuss 2024-03-06T19:43:47.194Z
Voting Results for the 2022 Review 2024-02-02T20:34:59.768Z
Vote on worthwhile OpenAI topics to discuss 2023-11-21T00:03:03.898Z
Vote on Interesting Disagreements 2023-11-07T21:35:00.270Z
Online Dialogues Party — Sunday 5th November 2023-10-27T02:41:00.506Z
More or Fewer Fights over Principles and Values? 2023-10-15T21:35:31.834Z
Dishonorable Gossip and Going Crazy 2023-10-14T04:00:35.591Z
Announcing Dialogues 2023-10-07T02:57:39.005Z
Closing Notes on Nonlinear Investigation 2023-09-15T22:44:58.488Z
Sharing Information About Nonlinear 2023-09-07T06:51:11.846Z
A report about LessWrong karma volatility from a different universe 2023-04-01T21:48:32.503Z
Shutting Down the Lightcone Offices 2023-03-14T22:47:51.539Z
Open & Welcome Thread — February 2023 2023-02-15T19:58:00.435Z
Rationalist Town Hall: FTX Fallout Edition (RSVP Required) 2022-11-23T01:38:25.516Z
LessWrong Has Agree/Disagree Voting On All New Comment Threads 2022-06-24T00:43:17.136Z
Announcing the LessWrong Curated Podcast 2022-06-22T22:16:58.170Z
Good Heart Week Is Over! 2022-04-08T06:43:46.754Z
Good Heart Week: Extending the Experiment 2022-04-02T07:13:48.353Z
April 2022 Welcome & Open Thread 2022-04-02T03:46:13.743Z
Replacing Karma with Good Heart Tokens (Worth $1!) 2022-04-01T09:31:34.332Z
12 interesting things I learned studying the discovery of nature's laws 2022-02-19T23:39:47.841Z
Ben Pace's Controversial Picks for the 2020 Review 2021-12-27T18:25:30.417Z
Book Launch: The Engines of Cognition 2021-12-21T07:24:45.170Z
An Idea for a More Communal Petrov Day in 2022 2021-10-21T21:51:15.270Z
Facebook is Simulacra Level 3, Andreessen is Level 4 2021-04-28T17:38:03.981Z
Against "Context-Free Integrity" 2021-04-14T08:20:44.368Z
"Taking your environment as object" vs "Being subject to your environment" 2021-04-11T22:47:04.978Z
I'm from a parallel Earth with much higher coordination: AMA 2021-04-05T22:09:24.033Z
Why We Launched LessWrong.SubStack 2021-04-01T06:34:00.907Z
"Infra-Bayesianism with Vanessa Kosoy" – Watch/Discuss Party 2021-03-22T23:44:19.795Z

Comments

Comment by Ben Pace (Benito) on Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It. · 2024-11-18T02:31:35.349Z · LW · GW

Hm, but I note others at the time felt it clear that this would exacerbate the competition (1, 2).

Comment by Ben Pace (Benito) on Dragon Agnosticism · 2024-11-18T02:11:22.902Z · LW · GW

Then I shall continue to tend to and grow my garden.

Comment by Ben Pace (Benito) on Dragon Agnosticism · 2024-11-18T02:03:51.992Z · LW · GW

It’s going pretty well for me! Most people I work with or am friends with know that there are multiple topics on which my thoughts are private, and there have been ~no significant social costs to me that I’m aware of.

I would like to be informed of opportunities to support others in this on LessWrong or in the social circles I participate in, to back you up if people are applying pressure on you to express your thoughts on a topic that you don’t want to talk about.

Comment by Ben Pace (Benito) on Lao Mein's Shortform · 2024-11-17T22:53:36.829Z · LW · GW

I know little enough that I don't know whether this statement is true. I would've guessed that in most $10B companies anyone with a title like "CFO" and "CTO" and "COO" is paid primarily in equity, but perhaps this is mostly true of a few companies I've looked into more (like Amazon).

Comment by Ben Pace (Benito) on Announcing turntrout.com, my new digital home · 2024-11-17T22:48:04.500Z · LW · GW

I am sad, but also I think it will probably be good for TurnTrout to have more distance.

Comment by Ben Pace (Benito) on Dragon Agnosticism · 2024-11-17T22:46:19.965Z · LW · GW

Also, a norm of "allowing people to keep their beliefs private on subjects they feel a lot of pressure on" gives space for people to gather information personally without needing to worry about the pressures on them from their society.

Comment by Ben Pace (Benito) on Sabotage Evaluations for Frontier Models · 2024-11-17T21:37:21.388Z · LW · GW

I have found it fruitful to argue this case back and forth with you, thank you for explaining and defending your perspective.

I will restate my overall position, I invite you to do the same, and then I propose that we consider this 40+ comment thread concluded for now.

———

The comment of yours that (to me) started this thread was the following.

If the default path is AI's taking over control from humans, then what is the current plan in leading AI labs? Surely all the work they put in AI safety is done to prevent exactly such scenarios. I would find it quite hard to believe that a large group of people would vigorously do something if they believed that their efforts will go to vain.

I primarily wish to argue that, given the general lack of accountability for developing machine learning systems in worlds where indeed the default outcome is doom, it should not be surprising to find out that there is a large corporation (or multiple) doing so. One should not assume that the incentives are aligned – anyone who is risking omnicide-level outcomes via investing in novel tech development currently faces no criminal penalties, fines, or international sanctions.

Given the current intellectual elite scene where a substantial number of prestigious people care about extinction level outcomes, it is also not surprising that glory-seeking companies have large departments focused on 'ethics' and 'safety' in order to look respectable to such people. Separately from any intrinsic interest, it has been a useful political chip for enticing a great deal of talent from scientific communities and communities interested in ethics to work for them (not dissimilar to how Sam Bankman-Fried managed to cause a lot of card-carrying members of the Effective Altruist philosophy and scene to work very hard to help build his criminal empire by talking a good game about utilitarianism, veganism, and the rest).

Looking at a given company's plan for preventing doom, and noticing it does not check out, should not be followed by an assumption of adequacy and good incentives such that surely this company would not exist nor do work on AI safety if it did not have a strong plan, I must be mistaken. I believe that there is no good plan and that these companies would exist regardless of whether a good plan existed or not. Given the lack of accountability, and my belief that alignment is clearly unsolved and we fundamentally do not knowing what we're doing, I believe the people involved are getting rich risking all of our lives and there is (currently) no justice here.

We have agreed on many points, and from the outset I believe you felt my position had some truth to it (e.g. "I do get that point that you are making, but I think this is a little bit unfair to these organizations."). I will leave you to outline whichever overall thoughts and remaining points of agreement or disagreement that you wish.

Comment by Ben Pace (Benito) on Sabotage Evaluations for Frontier Models · 2024-11-17T18:45:30.516Z · LW · GW

If a medicine literally kills everyone who takes it within a week of taking it, sure, it will not get widespread adoption amongst thousands of people.

If the medicine has bad side effects for 1 in 10 people and no upsides, or it only kills people 10 years later, and at the same time there is some great financial deal the ruler can make for himself in accepting this trade with the neighboring nation who is offering the vaccines, then yes I think that could easily be enough pressure for a human being to rationalize that actually the vaccines are good.

The relevant question is rarely 'how high stakes is the decision'. The question is what is in need of rationalizing, how hard is it to support the story, and how strong are the pressures on the person to do that. Typically when the stakes are higher, the pressures on people to rationalize are higher, not lower.

Politicians often enact policies that make the world worse for everyone (including themselves) while thinking they're doing their job well, due to the various pressures and forces on them. The fact that it arguably increases their personal chance of death isn't going to stop them, especially when they can easily rationalize it away because it's abstract. In recent years politicians in many countries enacted terrible policies during a pandemic that extended the length of the pandemic (there were no challenge trials, there was inefficient handing out of vaccines, there were false claims about how long the pandemic would last, there were false claims about mask effectiveness, there were forced lockdown policies that made no sense, etc). These policies hurt people and messed up the lives of ~everyone in the country I live in (the US), which includes the politicians who enacted them and all of their families and friends. Yet this was not remotely sufficient to cause them to snap out of it.

What is needed to rationalize AI development when the default outcome is doom? Here’s a brief attempt:

  • A lot of people who write about AI are focused on current AI capabilities and have a hard time speculating about future AI capabilities. Talk with these people. This helps you keep the downsides in far mode and the upsides in near mode (which helps because current AI capabilities are ~all upside, and pose no existential threat to civilization). The downsides can be pushed further into far mode with phrases like 'sci-fi' and 'unrealistic'.
  • Avoid arguing with people or talking with people who have thought a great deal about this and believe the default outcome is doom and the work should be stopped (e.g. Hinton, Bengio, Russell, Yudkowsky, etc). This would put pressure on you to keep those perspectives alive while making decisions, which would cause you to consider quitting.
  • Instead of acknowledging that we don't fundamentally don't know what we're doing, instead focus on the idea that other people are going to plough ahead. Then you can say that you have a better chance than them, rather than admitting neither of you have a good chance.

This puts you into a mental world where you're basically doing a good thing and you're not personally responsible for much of the extinction-level outcomes.

Intentionally contributing to omnicide is not what I am describing. I am describing a bit of rationalization in order to receive immense glory, and that leading to omnicide-level outcomes 5-15 years down the line. This sort of rationalizing why you should take power and glory is frequent and natural amongst humans.

Comment by Ben Pace (Benito) on Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It. · 2024-11-17T04:15:57.869Z · LW · GW

Thanks for expressing this perspective.

I note Musk was the first one to start a competitor, which seems to me to be very costly.

I think that founding OpenAI could be right if the non-profit structure was likely to work out. I don't know if that made sense at the time. Altman has overpowered getting fired by the board, removed parts of the board, and rumor has it he is moving to a for-profit, which is strong evidence against the non-profit being able to withstand the pressures that were coming, but even without Altman I suspect it would still involve billions of $ of funding, partnerships like the one with Microsoft, and other for-profit pressures to be the sort of player it is today. So I don't know that Musk's plan was viable at all.

Comment by Ben Pace (Benito) on Lao Mein's Shortform · 2024-11-17T03:29:29.771Z · LW · GW

Maybe there's a hope there, but I'll point out that many of the people needed to run a business (finance, legal, product, etc) are not idealistic scientists who would be willing to have their equity become worthless.

Comment by Ben Pace (Benito) on Ayn Rand’s model of “living money”; and an upside of burnout · 2024-11-17T00:06:35.720Z · LW · GW

I know that in my intellectual history it was Abram Demski's post The Credit Assignment Problem.

Comment by Ben Pace (Benito) on Sabotage Evaluations for Frontier Models · 2024-11-16T23:55:58.265Z · LW · GW

Unfortunately a fair chunk of my information comes from non-online sources, so I do not have links to share.

I do think that in order for government department to blatantly approve an unsafe model, it would take a lot of people to have secret agreements with.

Corruption is rarely blatant or overt. See this thread for what I believe to be an example for the CEO of RAND misleading a senate committee about his beliefs about the existential threat posed by AI. See this discussion about a time when an AI company attempted (Conjecture) to get critical comments about another AI company (OpenAI) taken down from LessWrong. I am not proposing a large conspiracy, I am describing lots of small bits of corruption and failures of integrity summing to a system failure. 

There will be millions of words of regulatory documents, and it is easy for things to slip such that some particular model class is not considered worth evaluating, or where the consequences of a failed evaluation is pretty weak.

Comment by Ben Pace (Benito) on Sabotage Evaluations for Frontier Models · 2024-11-16T23:38:42.352Z · LW · GW

The central thing I am talking about is basic measures for accountability, of which I consider very high up to be engaging with criticism, dialogue, and argument (as is somewhat natural given my background philosophy from growing up on LessWrong).

The story of a King doing things for good reasons lacks any mechanism for accountability if the King is behaving badly. It is important to design systems of power that do not rely on the people in power being good and right, but instead make it so that if they behave badly, they are held to account. I don't think I have to explain why incentives and accountability matter for how the powerful wield their powers.

My basic claim is that the plans for avoiding omnicide or omnicide-adjacent outcomes are not workable (slash there-are-no-plans), there is little-to-no responsibility being taken, and that there is no accountability for this illegitimate use of power.

If you believe that there is any accountability for the CEOs of the companies building potentially omnicidal machines and risking the lives of 8 billion people (such as my favorite mechanism: showing up and respectfully engaging with the people they have power over, but also any other mechanism you like; for instance there are not currently any criminal penalties for such behaviors, but that would be a good example if it did exist), I request you provide links, I would welcome specifics to talk about.

Comment by Ben Pace (Benito) on Sabotage Evaluations for Frontier Models · 2024-11-16T08:26:41.231Z · LW · GW

I'm having a hard time following this argument. To be clear, I'm saying that while certain people were in regulatory bodies in the US & UK govts, they actively had secret legal contracts to not criticize the leading industry player, else (prseumably) they could be sued for damages. This is not a past shady deals, this is about current people during their current tenure having been corrupted.

Comment by Ben Pace (Benito) on Sabotage Evaluations for Frontier Models · 2024-11-16T08:09:42.860Z · LW · GW

I'm sorry, I'm confused about something here, I'll back up briefly and then respond to your point.

My model is:

  • The vast majority of people who've seriously thought about it believe we don't know how to solve the alignment problem.
  • More fundamentally, there's a sense in which we "basically don't know what we're doing" with regards to AI. People talk about "agents" and "goals" and "intentions" but we're kind of like at the phlogiston theory of heat or vitalism theory of life. We don't get it. We have no equations, we have no theory, we're just like "man these systems can really write and make pretty pictures" like we used to say "I don't get it but some things are hot and some things are cold". Science was tried, found hard, engineering was tried, found easy, and now we're only doing that.
  • Many/most folks who've given it serious thought are pretty confident that the default outcome is doom (omnicide or permanent disempowerment), though it may be way kinda worse (e.g. eternal torture) or slightly better (e.g. we get to keep earth), due to intuitive arguments about instrumental goals and selecting on minds in the way machine learning works. (This framing is a bit local, in that not every scientist in the world would quite know what I'm referring to here.)
  • People are working hard and fast to build these AIs anyway because it's a profitable industry.

This literally spells the end of humanity (barring the eternal torture option or the grounded on earth option).

Back to your comment: some people are building AGI and knowingly threatening all of our lives. I propose they should show up and explain themselves. 

A natural question is "Why should they talk with you Ben? You're just one of the 8 billion people whose lives they're threatening."

That is why I am further suggesting they talk with many of the great and worthy thinkers who are of the position this is clearly bad, like Hinton, Bengio, Russell, Yudkowsky, Bostrom, etc.

I am reading you say something like "But as long as someone is defending their behavior, they don't need to show up to defend it themselves."

This lands with me like we are two lowly peasants, who are talking about how the King has mistreated us due to how the royal guards often beat us up and rape the women. I'm saying "I would like the King to answer for himself" and I'm hearing you say "But I know a guy in the next pub who thinks the King is making good choices with his powers. If you can argue with him, I don't see why the King needs to come down himself." I would like to have the people who are wielding the power defend themselves.

Again, this is not me proposing business norms, it's me saying "the people who are taking the action that looks like it kills us, I want those people in particular to show up and explain themselves".

Comment by Ben Pace (Benito) on Sabotage Evaluations for Frontier Models · 2024-11-16T07:38:31.653Z · LW · GW

I believe the disagreement is not about CEOs, it's about illegitimate power. If you'll allow me a brief detour, I'll try to explain.

Sometimes people grant other people power over them. For instance, I have agreed to work at my company. I've agreed that my CEO can fire me, and make many other demands of me, in exchange for money and other various demands I can make of him. Ideally we entered into this agreement freely and without inappropriate pressure.

Other times, people get power over people without any agreement or granting. Your parent typically has a lot of power over you until you are 18. They can determine what you eat, where you are physically located, what privacy you have, what resources you have, etc. Also, as has been very important for most of history, people have been able to be physically violent to one another and hurt people or even end their lives. Neither of these powers are come to consensually.

For the latter, an important question to ask is "How does one wield this power well? What does it mean to wield it well vs poorly?" There are many ways to parent, many choices about diet and schooling and sleep times and what are fair punishment. But some parents starve their children and beat them for not following instructions and sexually assault them. This is an inappropriate use of power.

There's a legitimacy that comes by being granted power, and an illegitimacy that comes with getting or wielding power that you were not granted.

I think that there's a big question about how to wield it well vs poorly, and how to respect people you have illegitimate powers over. Something I believe is that society functions better if we take seriously the attempt to wield it well. To not casually kill someone if you can get away with it and feel like it, but consider them as people worthy of respect, and ask how you can respect the people you've been non-consensually given power over.

This requires doing some work. It involves asking yourself what's a reasonable amount of effort to spend modeling their preferences given how much power you have over someone, it involves asking yourself if society has any good received wisdom on what to do with this particular power, and it involves engaging with people who are aggrieved by your use of power over them.

Now, the standard model for companies and businesses is a libtertarian-esque free market, where all trades are consensual and have no inappropriate pressure. This is like the first situation I describe, where a company has no people it has undue power over, no people who it can treat better or worse with the power it has over them.

The situation where you are building machines you believe may kill literally everyone, is like the second situation, where you have a very different power dynamic, where you're making choices that affect everyone's lives and that they had little-to-no say in. In such a situation, I think if you are going to do what is good and right, you owe it to show up and engage with those who believe you are using the power you have over them in ways that are seriously hurting them.

That's the difference between this CEO situation and all of the others. It's not about standards for CEOs, its about standards for illegitimate power.

This kind of talking-with-the-aggrieved-people-you-have-immense-power-over is a way of showing the people basic respect, and it is not present in this case. I believe these people are risking my life and many others', and they seem to me disrespectful and largely uninterested in showing up to talk with the people whose lives they are risking.

Comment by Ben Pace (Benito) on Sabotage Evaluations for Frontier Models · 2024-11-16T07:12:00.136Z · LW · GW

It is good enough for me, that the critic's argument are engaged by someone on your side. Going there personally seems unnecessary.

What engagement are you referring to? If there is such a defense that is officially endorsed by one of the leading companies developing potential omnicide-machines (or endorsed by the CEO/cofounders), that seriosuly engages with worthy critics, I don't recall it in this moment.

After all, if the goal is to build safe AI, you personally knowing a niche technical solution isn't necessary, if you have people on your team who are aware of publicly produced solutions as well as internal ones.

I believe that nobody on earth has a solution to the alignment problem, of course this would all be quite different if I felt anyone credibly claimed to have a good such solution.

Edit: Pardon me, I hit cmd-enter a little too quickly, I have now (~15 mins later) slightly edited my comment to be less frantic and a little more substantive.

Comment by Ben Pace (Benito) on Sabotage Evaluations for Frontier Models · 2024-11-16T07:08:05.694Z · LW · GW

I don't believe that last claim. I believe there is no industry where external auditors are known to have secret, legal contracts showing them to be liable for damages for criticizing the companies that they regulate. Or if there is, it's probably in a nation rife with corruption (e.g. some African countries, or perhaps Russia).

Comment by Ben Pace (Benito) on Sabotage Evaluations for Frontier Models · 2024-11-16T07:05:21.905Z · LW · GW

There's a common lack of clarity between following local norms, and doing what is right. A lot of people in the world lie when it is convenient, so someone lying when convenient doesn't mean that they're breaking from the norms, but it doesn't mean that the standard behavior is right nor that they're behaving well.

I understand that it is not expected of CEOs to defend their business existing. But in this situation they believe their creations have the potential to literally kill everyone on earth. This has ~never happened before. So now we have to ask ourselves "What is the right thing to do in this new situation?". I would say that the right thing to do is show up to talk with the people who do not want to die, and engage with what they have to say. Not to politely listen to them and say "I hear your concerns, now I will go back to my life and continue doing whatever I see fit." But to engage with the people and argue with them. And I think the natural people to do so with are the Nobel Prize winner in your field who thinks what you are doing is wrong.

I understand it's not expected of people to have arguments. But this is not a good thing about civilization, that people with incredible power over the world are free to hide away and never show up to face those who they have unaccountable power over. In a better world, they would show up to talk and defend their use of power over the rest of us, rather than hiding away and getting rich.

Comment by Ben Pace (Benito) on Sabotage Evaluations for Frontier Models · 2024-11-16T06:57:29.555Z · LW · GW

What probability do you think the ~25,000 Enron employees had about it collapsing and being fraudulent? Do you think it was above 10%? As another case study, here's a link to one of the biggest recent examples of people having numbers way off.

Re: government, I will point out that non-zero of the people in these governmental departments who are on track to being the people approving those models had secret agreements with one of the leading companies to never criticize them, so I'd take some of their supposed reliability as external auditors with a grain of salt.

Comment by Ben Pace (Benito) on Sabotage Evaluations for Frontier Models · 2024-11-16T06:49:02.695Z · LW · GW

The issue is that no matter how much good stuff they do, one can always call it marginal and not enough.

I think this is unfairly glossing my position. Simply because someone could always ask for more work does not imply that this particular request for more work is necessarily unreasonable. 

"I wish you would be a little less selfish"

"People always ask others to do more for them! When will this stop!"

"Well, in this case you have murdered many people for personal gratification and stolen the life savings of many people. Surely I get to ask it in this case."

The basic standards I am proposing is that if you are getting billions-of-dollars rich by building machines you believe may lead to omnicide or omnicide-adjacent outcomes for literally everyone, you should have a debate with a serious & worthy critic (e.g. a Nobel Prize winner in your field who believes you shouldn't be doing so, or perhaps a person who saw the alignment problem a ~decade ahead of everyone and worked on it as a top-priority the entire time) about (a) why you are doing this, and (b) whether the outcome will be good or bad.

I have many other basic standards that are not being met, but on the question of whether omnicide or omnicide-adjacent things are the default outcome when you are unilaterally running humanity towards it, this is perhaps the lowest standard for public discourse and honest debate. Publicly show up to engage with your strongest critics, at least literally once (though you are correct to assume that actually I think it should happen more than once).

Comment by Ben Pace (Benito) on Sabotage Evaluations for Frontier Models · 2024-11-16T06:32:48.423Z · LW · GW

I asked for the place where the cofounders of that particular multi-billion dollar company lay out why they have chosen to accept riches and glory in exchange for building potentially omnicidal machines, and engage with serious critics and criticisms. 

In your response, you said that this was implicitly such a justification. 

Yet it engages not once with arguments for the default outcome being omnicide or omnicide-adjacent, nor with the personal incentives on the author. 

So this article does not at all engage with the obvious and major criticisms I have mentioned, nor the primary criticisms of the serious critics like Hinton, Bengio, Russell, and Yudkowsky, none of whom are in doubt about the potential upside. Insofar as this is an implicit attempt to deal with the serious critics and obvious arguments, it totally fails.

Comment by Ben Pace (Benito) on Sabotage Evaluations for Frontier Models · 2024-11-16T06:32:09.555Z · LW · GW

Yes, it is very rare to see someone to unflinchingly kill themselves and their family. But all of this glory and power is sufficient to get someone to do a little self-deception, a little looking away from the unpleasant thoughts, and that is often sufficient to cause people to make catastrophic decisions.

"Who can really say how the future will go? Think of all the good things that might happen! And I like all the people around me. I'm sure they're working as hard as they can. Let's keep building."

Comment by Ben Pace (Benito) on Sabotage Evaluations for Frontier Models · 2024-11-16T05:17:37.833Z · LW · GW

You made lots of points, so I wrote a comment for each... probably this was too many replies? I didn't know what else to do that didn't feel like avoiding your points. I hereby state that I do not expect of you to respond to all five of my comments!

Comment by Ben Pace (Benito) on Sabotage Evaluations for Frontier Models · 2024-11-16T05:16:13.358Z · LW · GW

Another example would be Anthropic creating a dedicated team for stress testing their alignment proposals. And as far as I can see, this team is lead by someone who has been actively engaged with the topic of AI safety on LessWrong, someone who you sort of praised a few days ago.

I don't quite know what the point here is. This is marginally good stuff, it doesn't seem sufficient to me or close to it, I expect us all to probably die, and again: from the CEO and cofounders there has been no serious engagement or justification for the plan for them to personally make billions of dollars building potential omnicidal machines.

Comment by Ben Pace (Benito) on Sabotage Evaluations for Frontier Models · 2024-11-16T05:14:22.234Z · LW · GW

I think the point about them not engaging with critics is also a bit too harsh. Here is DeepMind's alignment team response to concerns raised by Yudkowski. I am not saying that their response is flawless or even correct, but it is a response nonetheless. They are engaging with this work. DeepMind's alignment team also seemed to engage with concerns raised by critics in their (relatively) recent work.

I don't disagree that it is good of the DeepMind alignment team to engage with arguments on LessWrong. I don't know that a few researchers at an org engaging with these arguments is meeting the basic standard here. The first post explicitly says it doesn't represent the leadership, and my sense is that the leadership have avoided engaging publicly with critics on that subject, and that the people involved do not have the political power to push for the leadership to engage in open debate.

That said I do concede the point that DeepMind has generally been more cautious than OpenAI and Anthropic, and never created the race to building potential omnicidal machines (in that they were first – it was OpenAI and Anthropic who added major competitors).

Comment by Ben Pace (Benito) on Sabotage Evaluations for Frontier Models · 2024-11-16T05:13:20.221Z · LW · GW

I don't think that money alone would've convinced CEOs of big companies to run this enterprise. Altman and Amodei, they both have families. If they don't care about their own families, then they at least care about themselves. After all, we are talking about scenarios where these guys would die the same deaths as the rest of us. No amounts of hoarded money would save them. They would have little motivation to do any of this if they believed that they would die as the result of their own actions. And that's not mentioning all of the other researchers working at their labs. Just Anthropic and OpenAI together have almost 2000 employees. Do they all not care about their and their families' well-being?

I'm not sure how quite to explain that I think a mass of people can do something that they each know on some level is the wrong thing and will hurt them later, but I believe it is common. I think partly it is a mistake to think of a mass of people as having the sum of the agency of all the people involved, or even the maximum.

I think it is easier than you do to simply not think about far away dangers that one can say one is not really responsible for. Does every trader involve in the '08 financial crisis take personal responsibility for it? Does every voter for a politician who turns out to ultimately be corrupt take personal responsibility for it? Do all the tens of thousands of people involved in various genocides take personal responsibility for stopping it as soon as they see it coming? I think it is very easy for people to erect a cartesian boundary between themselves and the levers of power. People are often aware that they are doing the wrong thing. I broke my diet two days ago and regret it, and on some level I knew I'd end up regretting it. And the was a situation I had complete agency over. The more indirectness, the more things are in far-mode, the less people take action on it or feel like they can do anything based on it today.

I agree it is not money alone. These people get to work alongside some of the most innovative and competent people of our age, connect with extremely prestigious journalists and institutions, be invited to halls of power in senior parts of government, and build systems mankind has never seen. All further incentive to find a good rationalization (rather than to stay home and not do that).

Comment by Ben Pace (Benito) on Sabotage Evaluations for Frontier Models · 2024-11-16T05:11:52.080Z · LW · GW

I do get that point that you are making, but I think this is a little bit unfair to these organizations. Articles like Machines of Loving Grace, The Intelligence Age and Planning for AGI and Beyond are implicit public justifications for building AGI.

I don't believe that either of the two linked pieces are justifications for building potentially omnicidal AGI.

The former explicitly avoids talking about the risks and states no plan for navigating them. As I've said before, I believe the generator of that essay is attempting to build a narrative in society that leads to people support the author's company, not attempting to engage seriously with critics of him building potentially omnicidal machines, nor attempting to explain anything about how to navigate that risk.

The latter meets the low standard of mentioning the word 'existential' but mostly seems to hope that we can choose to have a smooth takeoff, rather than admitting that (a) there is no known theory of how novel capabilities will arrive with new architectures & data & compute, and (b) the company is essentially running as fast as it can. I mostly feel like it acknowledges reasons for concern and then says that it beliefs in itself, not entirely dissimilar to how a politician makes sure to state the wishes of their various constituents, before going on to do whatever they want. 

There are no commitments. There are no promises. There is no argument that this can work. There is only an articulation of what they're going to do, the risks, and a belief that they are good enough to pull through.

Such responses are unserious.

Comment by Ben Pace (Benito) on Lao Mein's Shortform · 2024-11-15T22:47:54.850Z · LW · GW

It's not clear to me that there was actually an option to build a $100B company with competent people around the world who would've been united in conditionally shutting down and unconditionally pushing for regulation. I don't know that the culture and concepts of people who do a lot of this work in the business world would allow for such a plan to be actively worked on.

Comment by Ben Pace (Benito) on Sabotage Evaluations for Frontier Models · 2024-11-15T21:48:51.222Z · LW · GW

Not cynical enough! They make billions of dollars and for most of the time they've done this there have been little-to-no people with serious political power or prestige in the world who strongly and openly hold the position that it's doomed, so I think it's been pretty easy for people to come up with rationalizations that lets them go ahead and build some of the most incredible and powerful things humanity has ever built without facing much social pressure or incentive to do change course.

You might not think it, but basically none of the leadership of these organizations has even written a public justification for why they're willing to risk the lives of everyone on earth while getting rich, never mind argued with critics. So I don't take them very seriously as ethical/responsible people nor do I think their words about plans to avoid extinction/disempowerment should primarily be modeled as tracking reality rather than as useful rationalizations.

(Sorry dude. But it's good to be on Earth with you before some fools end it.)

Comment by Ben Pace (Benito) on Sabotage Evaluations for Frontier Models · 2024-11-15T21:13:05.398Z · LW · GW

Yes, it does imply that the default path is permanent-disempowerment or extinction.

Comment by Ben Pace (Benito) on Lao Mein's Shortform · 2024-11-15T20:22:59.907Z · LW · GW

In the email above, clearly stated, is a line of reasoning that has lead very competent people to work extremely hard to build potentially-omnicidal machines.

Comment by Ben Pace (Benito) on Catastrophic sabotage as a major threat model for human-level AI systems · 2024-11-15T00:24:03.489Z · LW · GW

Curated![1] This thinks through a specific threat model with future AIs, and how to concretely mitigate it. I don't have a strong opinion on whether it works, but I really want to encourage more concrete analysis of this sort.

  1. ^

    "Curated", a term which here means "This just got emailed to 30,000 people, of whom typically half open the email, and it gets shown at the top of the frontpage to anyone who hasn't read it for ~1 week."

Comment by Ben Pace (Benito) on More Dakka · 2024-11-13T08:12:05.760Z · LW · GW

This is the canonical More Dakka post, but it doesn't include the origin.

Quoting the TV Tropes page on More Dakka:

The trope namer is Warhammer 40,000, where it is the Ork onomatopoeia for machine gun firing, and their general term for rapid fire capacity: "dakka-dakka-dakka-dakka...".

Here's a color version of the image on the page, that honestly conveys the heuristic to me quite effectively. It's very stupid, but man it's good to just go more dakka when marginal effort is continuing to improve things.

Comment by Ben Pace (Benito) on AI Safety is Dropping the Ball on Clown Attacks · 2024-11-13T05:41:55.081Z · LW · GW

I saw this image shared on Twitter, which sees the same phenomena (clowns) but takes a pretty opposite position on how it's used.

Image

(I'm not linking to attribution because Twitter feels like a bad game and it's shared in a highly political context.)

Comment by Ben Pace (Benito) on Lighthaven Sequences Reading Group #10 (Tuesday 11/12) · 2024-11-12T19:46:46.409Z · LW · GW

By the way, for my circle tonight, I'd like to do something a little different, involving writing at least as much as talking. If you might like to join me, please bring your laptop.

Comment by Ben Pace (Benito) on The hostile telepaths problem · 2024-11-12T04:56:21.923Z · LW · GW

Curated![1] 

I think this is an excellent post on a tricky subject. I found here an articulate description of a great many internal experiences and thoughts I've had but have never well-named or seen written down clearly (e.g. 'occlumency' is a skill I have practiced a lot). I find this topic pretty hard to talk and think openly about, in large part due to the adversarial dynamics, so I am especially grateful for this post (and the ensuing discussion section). One of my favorite posts on LW this year, I think.

Personally, I frame the "Having power" solution as "Gaining independence". I think power is a bit goodhartable on in a corruptible way, and the true goal is to be able to think whichever thoughts you'd think if you had no influences on you, not the thoughts you'd think if you had immense power.

  1. ^

    "Curated", a term which here means "This just got emailed to 30,000 people, of whom typically half open the email, and it gets shown at the top of the frontpage to anyone who hasn't read it for ~1 week."

Comment by Ben Pace (Benito) on evhub's Shortform · 2024-11-09T20:50:15.305Z · LW · GW

Thanks for the responses, I have a better sense of how you're thinking about these things.

I don't feel much desire to dive into this further, except I want to clarify one thing, on the question of any demands in your comment.

I originally felt that either question I asked would be reasonably easy to answer, if time was given to evaluating the potential for harm.

However, given that Hubinger might have to run any reply by Anthropic staff, I understand that it might be negative to demand further work. This is pretty obvious, but didn't occur to me earlier.

That actually wasn't primarily the part that felt like a demand to me. This was the part:

How much time have you spent analyzing the positive or negative impact of US intelligence efforts prior to concluding that merely using Claude for intelligence "seemed fine"?

I'm not quite sure what the relevance of the time was if not to suggest it needed to be high. I felt that this line implied something like "If your answer is around '20 hours', then I want to say that the correct standard should be '200 hours'". I felt like it was a demand that Hubinger may have to spend 10x the time thinking about this question before he met your standards for being allowed to express his opinion on it.

But perhaps you just meant you wanted him to include an epistemic status, like "Epistemic status: <Here's how much time I've spent thinking about this question>".

Comment by Ben Pace (Benito) on evhub's Shortform · 2024-11-09T08:40:12.200Z · LW · GW

Personally, I think that overall it's good on the margin for staff at companies risking human extinction to be sharing their perspectives on criticisms and moving towards having dialogue at all, so I think (what I read as) your implicit demand for Evan Hubinger to do more work here is marginally unhelpful; I weakly think quick takes like this are marginally good.

I will add: It's odd to me, Stephen, that this is your line for (what I read as) disgust at Anthropic staff espousing extremely convenient positions while doing things that seem to you to be causing massive harm. To my knowledge the Anthropic leadership has ~never engaged in public dialogue about why they're getting rich building potentially-omnicidal-minds with worthy critics like Hinton, Bengio, Russell, Yudkowsky, etc, so I wouldn't expect them or their employees to have high standards for public defenses of far less risky behavior like working with the US military.[1]

  1. ^

    As an example of the low standards for Anthropic's public discourse, notice how a recent essay about what's required for Anthropic to succeed at AI Safety by Sam Bowman (a senior safety researcher at Anthropic) flatly states "Our ability to do our safety work depends in large part on our access to frontier technology... staying close to the frontier is perhaps our top priority in Chapter 1" with ~no defense of this claim or engagement with the sorts of reasons that I consider adding a marginal competitor to the suicide race is an atrocity, or acknowledgement that this makes him personally very wealthy (i.e. he and most other engineers at Anthropic will make millions of dollars due to Anthropic acting on this claim).

Comment by Ben Pace (Benito) on The Median Researcher Problem · 2024-11-08T18:55:54.670Z · LW · GW

Curated. I think this model is pretty useful and well-compressed, and I'm glad to be able to concisely link to it.

Insofar as it is accurate, the policy implications are still much open to debate, for here on LessWrong and for other ecosystems in the world.

Comment by Ben Pace (Benito) on Open Thread Fall 2024 · 2024-11-07T22:51:37.702Z · LW · GW

Site update: the menu bar is shorter!

Previously I found it overwhelming when I opened it, and many of the buttons were getting extremely little use. It now looks like this.

If you're one of the few people who used the other buttons, here's where you can find them:

  • New Question: If you click on "New Post", it's one of the types of post available at the top of the page.
  • New Dialogue: If you go to any user page, you can see an option to invite them to a dialogue at the top, next to the option to send them a message or subscribe to their posts.
  • New Sequence: You can make your first when you scroll down the Library page, and once you've got one you can also make a new one from your profile page.
  • Your Quick Takes: Your shortform post is pinned to the top of your posts on your profile page.
  • Bookmarks: This menu icon will re-appear as soon as you have any bookmarks, which you set in the same way (from the triple-dot menu on posts.
Comment by Ben Pace (Benito) on Scissors Statements for President? · 2024-11-06T19:01:23.072Z · LW · GW

Minor mod note: I left this post on personal blog, as we generally avoid frontpaging content related to and during the US election for sanity protection. To be clear, this post is pretty abstracted from any election details, so I'd normally frontpage it, but I'm erring on the side of leaving on personal while the election is so close (literally today).

Comment by Ben Pace (Benito) on Are Your Enemies Innately Evil? · 2024-11-06T07:35:51.973Z · LW · GW

Do you think this is typical of people you know?

Comment by Ben Pace (Benito) on Lighthaven Sequences Reading Group #9 (Tuesday 11/05) · 2024-11-06T05:26:33.242Z · LW · GW

I really enjoyed the discussion tonight. Something different about the mood in a warmly lit, cozy attic.

Here's a write-up of a discussion we had that ended up with polling over 2,000 people on a question that was raised about the essay Are Your Enemies Innately Evil?.

Comment by Ben Pace (Benito) on Are Your Enemies Innately Evil? · 2024-11-06T05:25:05.269Z · LW · GW

Realistically, most people don’t construct their life stories with themselves as the villains. Everyone is the hero of their own story.

At tonight's sequences-reading meetup, I argued that while it is a mistake to think that people typically see themselves as villains, it is also a mistake to think that they typically view themselves as heroes. Most people don't have especially grand narratives, nor do they view themselves as very strongly moral in either direction (even though I believe there's a trend toward positive self-image).

To get a little data on this question, resident Queen-of-Polls Aella polled over 2,000 people on the following question:

do you feel a sense of heroism - like righteous grand goodness - when it comes to your behavior or advocacy around politics, religion, or cultural opinions?

  1. Yes very much
  2. Sometimes/partially
  3. Not really 

Here are the (spoilered) results, after being up for a little over 2 hours and getting over 2,000 responses. Write down your predictions now if you want to actually test your models.

  1. Yes very much: 12.1%
  2. Sometimes/partially: 27.2%
  3. Not really: 60.7%

My comments on the results:

Once Aella told me she had sent out the poll, when I queried my anticipations, I actually predicted differently to the direction of my argument. (I later noted it was similar to the person who believes there's a dragon in their garage but anticipates the flour falling to the ground). Anyway, given this poll, I predicted 

  1. Yes = 30%
  2. Sometimes = 50%
  3. No = 20%

Whereas in fact it was much more in-line with the argument I was giving. Not sure what that says about my world-models!

Comment by Ben Pace (Benito) on Lighthaven Sequences Reading Group #9 (Tuesday 11/05) · 2024-11-06T01:49:06.925Z · LW · GW

The place is full of people who are interested in some boring local political event.

Sequence Readers will be meeting up in the Safe Haven attic (Bayes Attic) where no election-discussion is allowed, at around 6:15-6:30. Hope to see you there :) Probably we'll be running late.

Comment by Ben Pace (Benito) on JargonBot Beta Test · 2024-11-01T20:24:18.792Z · LW · GW

I currently wish I had a policy for knowing with confidence whether a user wrote part of their post with a language model. There's a (small) regular stream of new-user content that I look through, where I'm above 50% that AI wrote some of it (very formulaic, unoriginal writing, imitating academic style) but I am worried about being rude when saying "I rejected your first post because I reckon you didn't write this and it doesn't reflect your thoughts" if I end up being wrong like 1 in 3 times[1].

Sometimes I use various online language-model checkers (1, 2, 3), but I don't know how accurate/reliable they are. If they are actually pretty good, I may well automatically run them on all submitted posts to LW so I can be more confident.

  1. ^

    Also one time I pushed back on this and the user explained they're not a native English speaker, so tried to use a model to improve their English, which I thought was more reasonable than many uses.

Comment by Ben Pace (Benito) on Lighthaven Sequences Reading Group #8 (Tuesday 10/29) · 2024-10-31T23:21:05.999Z · LW · GW

We're trying to! I normally publish next week's at the end of the meetup, but this week I was sick (and Aella was out of town). Next week's is up, I've drafted the week after's and picked the readings for the week after that, so we should be on-time for the next few.

Comment by Ben Pace (Benito) on I got dysentery so you don’t have to · 2024-10-31T18:47:32.099Z · LW · GW

Curated![1] This is the first account I've ever read of someone doing a human challenge trial, something that's always had an aura of danger to it (for me), so I'm grateful to have read about what it's actually like, and I also enjoyed learning a lot about a disease I knew little about along the way.

  1. ^

    "Curated", a term which here means "This just got emailed to 30,000 people (of whom typically half open the email) and for ~1 week it gets shown at the top of the frontpage to anyone who hasn't read it."

Comment by Ben Pace (Benito) on Lighthaven Sequences Reading Group #8 (Tuesday 10/29) · 2024-10-29T20:20:51.267Z · LW · GW

I am sadly still sick! It's been two weeks. I'm taking antibiotics to get over this infection, I am on the mend, and I hope to join next week. Tonight afterwards I will hang out a bit by the fireside (outside only). Tonight Ronny will lead.