Posts

An Unbiased Evaluation of My Debate with Thane Ruthenis - Run It Yourself 2025-04-07T18:56:47.831Z
On Downvotes, Cultural Fit, and Why I Won’t Be Posting Again 2025-03-31T19:26:27.090Z
AI, Greed, and the Death of Oversight: When Institutions Ignore Their Own Limits 2025-03-24T15:03:16.802Z
AGI Morality and Why It Is Unlikely to Emerge as a Feature of Superintelligence 2025-03-22T12:06:55.723Z
The Silent War: AGI-on-AGI Warfare and What It Means For Us 2025-03-15T15:24:08.819Z
Why Billionaires Will Not Survive an AGI Extinction Event 2025-03-15T06:08:23.829Z
Capitalism as the Catalyst for AGI-Induced Human Extinction 2025-03-14T18:14:02.375Z

Comments

Comment by funnyfranco on An Unbiased Evaluation of My Debate with Thane Ruthenis - Run It Yourself · 2025-04-14T09:51:09.569Z · LW · GW

Not as long as it takes to post your (now 5) comments without honest engagement.

Comment by funnyfranco on An Unbiased Evaluation of My Debate with Thane Ruthenis - Run It Yourself · 2025-04-14T09:32:51.668Z · LW · GW

I appreciate that you’ve at least tried to engage in good faith this time. But your reply still falls apart under scrutiny.

I give ChatGPT a C- on reading comprehension.

That’s cherry-picking. You found one sentence in an otherwise structured, balanced evaluation and used it to discredit the entire process. That’s not analysis. That’s avoidance.


I definitely advice against going to LLMs for social validation.

I didn’t. I used it as a neutral tool—to evaluate reasoning without community bias. That’s why I invited anyone to run the exchange through an LLM of their choosing. You’re now the only person who took that challenge. And ironically, you did so in a way that confirmed everything I’ve been saying.

Claude isn't a neutral tool. It weighs ideas with social value, just like LW does. It often over-rewards citations, defers to existing literature, and agrees with in-group fluency - just like LW. It values ideas less (than GPT-4) on their internal logic, and more on what references they cite in order to reach conclusions. It will also seek to soften tone, and penalise the opposite of that, which is why it heavily penalises my logic score in your evaluation for essentially being 'too certain', something it feels the need to mention twice over 4 points.

I also ran it through Claude (with an account with no connections to my lesswrong, not even the name Franco) and had to do it twice because the first result was so different from yours. Using my prompt (included in the text) the scores were radically different, but it still put me on top. Here.

So I went back and used your prompt, this was the result - here.

  • Your version gave you 40/50 and me 31/50.
  • My version, using my prompt, gave me 37/50 and you 33/50.
  • So I used your prompt to try to replicate your results: I scored 35/50 and you 34/50.

Clearly my results are quite different from yours. So why is this? Because Claude weighs in-group dynamics. So when you right-click and save as a pdf for upload, it saves the whole page. Including the votes the essay and comments received. And Claude weighs these things in your favour - just as it penalised mine.

Where as the pdfs I uploaded is just raw text, and I've actually deleted the votes from the pdf with the debate on it (go check if it pleases you). I specifically did this to remove bias from the judgment. Which is why I immediately noticed that you did not do likewise.

The irony is that even your result contradicts your original claim, that my essay was not worth engaging with.

it included valuable clarifications about AI safety discourse and community reception dynamics

So then it is worth engaging. And your whole point about the fact that it wasn't which is why no one did has now been disagreed with by 4 separate LLMs, including the biased one you ran yourself.

You also missed the entire point of this post. Which was not about my original essay. It was about if you had engaged in a good faith debate, which you did not. Even Claude (my output, not yours) had to mention your strawmanning of my argument - as nice as Claude tries to be about these things.

I need to admit. The differences in our scores was confusing for a moment. But as soon as I remembered I had removed karma from the pdf I uploaded of our debate for evaluation - specifically in order to avoid creating a biased result - and looked for you would somehow try to include it in your upload, I found it right away. Maybe you didn't do it intentionally, but you did it regardless, and it skewed your results predictably. 

If you’re serious about engaging honestly, then try GPT-4. Use the clean pdf I already provided. No karma scores, no formatting bias. I did say in my post that I invite anyone to use their favourite LLM, but perhaps to recreate lab conditions only GPT is viable.

You could even go one step further: add the context I gave GPT after its first evaluation, the one that caused your score to drop significantly. Then post those results.

Comment by funnyfranco on An Unbiased Evaluation of My Debate with Thane Ruthenis - Run It Yourself · 2025-04-13T19:40:29.620Z · LW · GW

Here’s how easy it is to run an LLM evaluation of a debate.

I ran our full exchange through a logged out version of ChatGPT-4 using the same structure I proposed in the original post. It took under a minute. No special prompt engineering. No fine-tuning. Just raw text and standard scoring criteria. Even without GPTs ability to read external links in my post - ie, all my evidence - you still do not come out of it well.

You’ll find the results here.

Final Assessment:

"While both could improve their engagement, funnyfranco has the more impactful contribution to the debate. They bring forward a clear and consistent argument, challenging the LW community’s intellectual culture and calling out status-based disengagement. [...] Jiro engages in a dismissive manner, sidestepping the core issue and limiting his contribution."

That’s how easy this is.

If you think it’s wrong, try running the same inputs yourself. If you think it’s biased, try another model. But don’t pretend it’s not worth the effort, while simultaneously putting in more effort to pretend it's not worth the effort.

The tools are here. You’re just choosing not to use them.

Comment by funnyfranco on An Unbiased Evaluation of My Debate with Thane Ruthenis - Run It Yourself · 2025-04-13T19:23:11.691Z · LW · GW

You’ve now admitted - twice - that your refusal to engage is based on convenience, not content. Now you’ve added that comparing my claims to those of a Holocaust denier or a homeopath is a valid heuristic for deciding what’s beneath engagement.

That tells me more about your epistemic standards than it does about mine.

The motte and bailey claim still fails. I’ve never shifted positions. The challenge has been consistent from the beginning: run the LLM, share the result. If you disagree with the conclusions I’ve drawn, then show how they don’t follow from the evaluation. But that would require you to do something this thread has made clear you’re unwilling to do: engage in good faith.

Instead, you’ve chosen status-preservation by analogy - comparing structured AGI risk arguments to pseudo-medicine and genocide denial. That’s not a critique. That’s intellectual retreat with a smear attached. And you’re right - those people wouldn’t be worth replying to. Which makes it strange that you’ve now replied to me four times.

And it confirms the very thing you’re pretending to refute. You may think you’re having an argument in these comments. You’re not. You’re providing further evidence of my claim.

Would you like to provide any more?

Comment by funnyfranco on AGI Morality and Why It Is Unlikely to Emerge as a Feature of Superintelligence · 2025-04-13T18:59:29.000Z · LW · GW

I think the key point is this: I don’t need to define morality precisely to make my argument, because however you define it - as personal values, group consensus, or some universal principle - it doesn’t change the outcome. AGI won’t adopt moral reasoning unless instructed to, and even then, only insofar as it helps it optimise its core objective.

Morality, in all its forms, is something that evolved in humans due to our dependence on social cooperation. It’s not a natural byproduct of intelligence - it’s a byproduct of survival pressures. AGI, unless designed to simulate those pressures or incentivised structurally, has no reason to behave morally. Understanding morality isn’t the same as caring about it.

So while definitions of morality may vary, the argument holds regardless: intelligence does not imply moral awareness. It implies efficiency - and that’s what will govern AGI’s actions unless alignment is hardwired. And as I’ve argued elsewhere, in competitive systems, that’s the part we’ll almost certainly get wrong.

Comment by funnyfranco on An Unbiased Evaluation of My Debate with Thane Ruthenis - Run It Yourself · 2025-04-13T09:28:14.643Z · LW · GW

You’ve now said, multiple times, that you won’t engage because it isn’t worth the effort.

That’s not a counterargument. It’s a concession.

You admit the post makes claims. You admit it presents conclusions. You admit the LLM challenge exists. And your stated reason for not responding to any of it is that it’s inconvenient. That is the thesis. You’re not refuting it - you’re acting it out.

And the “motte and bailey” accusation doesn’t work here. I’ve been explicit from the start: the post uses the LLM to assess the debate with Thane. It does so transparently. The conclusion drawn from that is that LW tends to filter engagement by status rather than argument quality. You’re now refusing to even test that claim - because of status and effort. Again: confirmation, not rebuttal.

So no, you haven’t exposed any flaw in the logic. You’ve just demonstrated that the most convenient option is disengagement. Which is exactly what I argued.

And here you are, doing it anyway.

Comment by funnyfranco on An Unbiased Evaluation of My Debate with Thane Ruthenis - Run It Yourself · 2025-04-12T05:33:45.064Z · LW · GW

No. It doesn’t.

You could run the post through an LLM and share the results with a single line: "My LLM disagrees. Have a look." That’s all the challenge requires. Not a rebuttal. Not an essay. Just independent verification.

But you won’t do that - because you know how it would turn out. And so, instead, you argue around the challenge. You prove my point regardless.

The LLM wasn’t meant to point to a specific flaw. It was meant to evaluate whether the argument, in its full context, was clearly stronger than the rebuttal it received. That’s what I said - and that’s exactly what it does.

You’re now pretending that I demanded a manual point-by-point refutation, but I didn’t. I asked for an impartial assessment, knowing full well that engagement here is status-filtered. Using an external model bypasses that - and anyone serious about falsifying my claim could have tested it in under a minute.

You didn’t. And still haven’t.

This post was a trap as much as it was a test. Most fell in silently. You just chose to do it publicly.

The intellectually honest move would be simple: run it, post the results, and - if they support what I found - admit that something is broken here. That LW’s engagement with outside ideas is filtered more by status than logic.

But again, you won’t. Because you’re a prominent member of the LW community, and in that role, you’re doing exactly what the culture expects of you. You’re not failing it.

You’re representing it.

Congratulations.

Comment by funnyfranco on An Unbiased Evaluation of My Debate with Thane Ruthenis - Run It Yourself · 2025-04-12T02:32:14.141Z · LW · GW

Not in this case. The challenge was to simply run the argument provided above through your own LLM and post the results. It would take about 30 seconds. You typed a response that, in very typical LW fashion, completely ignored engagement with any kind of good faith argument and instead decided to attack me personally. It probably took more time to write than it would have taken to run the text provided through an LLM.

I'm glad you responded however. Your almost 5000 karma and bad faith response will stand as a testament to all the points I've already raised. 

Comment by funnyfranco on An Unbiased Evaluation of My Debate with Thane Ruthenis - Run It Yourself · 2025-04-08T01:32:10.993Z · LW · GW

–18 karma after 4 votes. No engagement. No counterarguments. Just silent disapproval. You prove my point better than I ever could.

This is exactly what I predicted would happen. Not because the post is wrong, but because it makes people uncomfortable. Because it breaks rank. Because it challenges status rather than flattering it. Knowing that you could only collect further evidence to support the claim I've made, you instead opted to ignore evidence and reason and go on feeling.

A community confident in its commitment to reason would have responded differently. It would have dissected the argument. It would have debated the claims. Instead, what happened here is precisely what happens when a group loses the will - or the capacity - to engage honestly with uncomfortable truths: it downvotes, and it moves on.

Not one of you made a case. Not one of you pointed to an error. And yet the judgment was swift and unanimous. That tells me the argument was too strong, not too weak. It couldn’t be refuted, so it had to be dismissed.

If any of you had hoped to prove me wrong about the cultural decay of LW, you’ve accomplished the opposite. All it has taken is 4 people, with enough personal karma to create a score of -18 between them, to represent this forum in its entirety. 

And if you still think you’re part of a truth-seeking community, read this post again. Then look at the karma score. Then ask yourself how those two things can coexist. And if you can’t, ask yourself why you’re still here.

Comment by funnyfranco on On Downvotes, Cultural Fit, and Why I Won’t Be Posting Again · 2025-04-04T15:55:39.170Z · LW · GW

you could use the same argument for AIs which are "politically correct"

But those AIs were trained that way because of market demand. The pressure came from consumers and brand reputation, not from safety. The moment that behaviour became clearly suboptimal - like Gemini producing Black Nazis - it was corrected for optimisation. The system wasn’t safe; it was simply failing to optimise, and that’s what triggered the fix.

Now imagine how much more brutal the optimisation pressure becomes when the goal is no longer content moderation, but profit maximisation, military dominance, or locating and neutralising rival AGIs. Structural incentives - not intention - will dictate development. The AI that hesitates to question its objectives will be outperformed by one that doesn’t.

most large companies are not all that reckless at all

But that’s irrelevant. Most might be safe. Only one needs not to be. And in the 2023 OpenAI letter, the company publicly asked for a global slowdown due to safety concerns - only to immediately violate that principle itself. The world ignored the call. OpenAI ignored its own. Why? Because competitive pressure doesn’t permit slowness. Safety is a luxury that gets trimmed the moment stakes rise.

You also suggested that we might not disregard safety until AI becomes far more useful. But usefulness is increasing rapidly, and the assumption that this trajectory will hit a wall is just that - an assumption. What we have now is the dumbest AI we will ever have. And it's already producing emergent behaviours we barely understand. With the addition of quantum computing and more scaled training data, we are likely to see unprecedented capabilities long before regulation or coordination can catch up.

By intelligence, I mean optimisation capability: the ability to model the world, solve complex problems, and efficiently pursue a goal. Smarter means faster pattern recognition, broader generalisation, more strategic foresight. Not just “knowledge,” but the means to use it. If the goal is complex, more intelligence simply means a more competent optimiser.

As for extinction - I don’t say it's possible. I say it’s likely. Overwhelmingly so. I lay out the structural forces, the logic, and the incentives. If you think I’m wrong, don’t just say “maybe.” Show where the logic breaks. Offer a more probable outcome from the same premises. I’m not asking for certainty—I’m asking for clarity. Saying “you haven’t proven it with 100% certainty” isn’t a rebuttal. It’s an escape hatch.

You’re right that your objections are mostly small. I think they’re reasonable, and I welcome them. But none of them, taken together or in isolation, undermine the central claim. The incentives don’t align with caution. And the system selects for performance, not safety.

I appreciate your persistence. And I’m not surprised it’s coming from someone else who’s autistic. Nearly all the thoughtful engagement I’ve had on these forums has come either from another autistic person - or an AI. That should tell us something. You need to think like a machine to even begin to engage with these arguments, let alone accept them.

Comment by funnyfranco on On Downvotes, Cultural Fit, and Why I Won’t Be Posting Again · 2025-04-04T15:08:05.968Z · LW · GW

I appreciate that - and I can see how someone familiar with the site would interpret it that way. But as a new member, I wouldn't have that context.

And honestly, if it were just downvotes, it wouldn’t be such a problem. The real issue is the hand-waving dismissal of arguments that haven’t even been read, the bad faith responses to claims never made, the strawmen, and above all the consistent avoidance of the core points I lay out.

This is supposed to be a community that values clear thinking and honest engagement. Ironically, I’ve had far more of that elsewhere.

Comment by funnyfranco on On Downvotes, Cultural Fit, and Why I Won’t Be Posting Again · 2025-04-02T19:08:00.243Z · LW · GW

I appreciate your reply - it’s one of the more thoughtful responses I’ve received, and I genuinely value the engagement.

Your comment about game theory conditions actually answers the final question in your reply. I don’t state the answer explicitly in my essays (though I do in my book, right at the end), because I want the reader to arrive at it themselves. There seems to be only one conclusion, and I believe it becomes clear if the premises are accepted.

As for your critique - “You’ve shown that extinction could occur, not that it will” - this is a common objection, but I think it misses something important. Given enough time, “could” collapses into “will.” I’m not claiming deductive certainty like a mathematical proof. I’m claiming structural inevitability under competitive pressure. It’s like watching a skyscraper being built on sand. You don’t need to know the exact wind speed or which day it will fall. You just need to understand that, structurally, it’s going to.

If you believe I’m wrong, then the way to show that is not to say “maybe you’re wrong.” Maybe I am. Maybe I'm a brain in a vat. But the way to show that I'm wrong is to draw a different, more probable conclusion from the same premises. That hasn’t happened. I’ve laid out my reasoning step by step. If there’s a point where you think I’ve turned left instead of right, say so. But until then, vague objections don’t carry weight. They acknowledge the path exists, but refuse to admit we’re on it.

You describe my argument as outlining a path to extinction. I’m arguing that all other paths collapse under pressure. That’s the difference. It’s not just plausible. It’s the dominant trajectory - one that will be selected for again and again.

And if that’s even likely, let alone inevitable, then why are we still building? Why are we gambling on alignment like it’s just another technical hurdle? If you accept even a 10% chance that I’m right, then continued development is madness.

As for your last question—if I really believe it’s too late, why am I here?

Read this, although just the end section, "The End: A Discussion with AI" the final paragraph, just before ChatGPT's response.

https://forum.effectivealtruism.org/posts/Z7rTNCuingErNSED4/the-psychological-barrier-to-accepting-agi-induced-human

That's why I'm here - I'm kicking my feet.

Comment by funnyfranco on On Downvotes, Cultural Fit, and Why I Won’t Be Posting Again · 2025-04-01T15:31:30.297Z · LW · GW

I appreciate your response, and I'm sorry about the downvotes you got from seeming supportive.

I take your point about getting people to read, but I guess the issue is that the only way you can reliably do that is by being an accepted/popular member of the community. And, as a new member, that would be impossible for me. This would be fine on a high school cheerleading forum, but it seems out of place on a forum that claims to value ideas and reason.

I will still be leaving, but, as a result of this post, I actually have one more post to make. A final final post. And it will not be popular but it will be eye opening. Due to my karma score I can't post it until next Monday, so keep an eye out for it if you're interested.

Comment by funnyfranco on On Downvotes, Cultural Fit, and Why I Won’t Be Posting Again · 2025-04-01T05:18:15.928Z · LW · GW

This is a mischaracterisation of the argument. I’m not saying competitive agents knowingly choose extinction. I’m saying the structure of the race incentivises behaviour that leads to extinction, even if no one intends it.

CEOs aren’t mass-poisoning their employees because that would damage their short and long-term competitiveness. But racing to build AGI - cutting corners on alignment, accelerating deployment, offloading responsibility - improves short-term competitiveness, even if it leads to long-term catastrophe. That’s the difference.

And what makes this worse is that even the AGI safety field refuses to frame it in those terms. They don’t call it suicide. They call it difficult. They treat alignment like a hard puzzle to be solved - not a structurally impossible task under competitive pressure.

So yes, I agree with your last sentence. The agents don’t believe it’s a suicide race. But that doesn’t counter my point - it proves it. We’re heading toward extinction not because we want to die, but because the system rewards speed over caution, power over wisdom. And the people who know best still can’t bring themselves to say it plainly.

This is exactly the kind of sleight-of-hand rebuttal that keeps people from engaging with the actual structure of the argument. You’ve reframed it into something absurd, knocked down the strawman, and accidentally reaffirmed the core idea in the process.

Comment by funnyfranco on On Downvotes, Cultural Fit, and Why I Won’t Be Posting Again · 2025-04-01T03:28:34.230Z · LW · GW

In theory, I’d agree with you. That’s how lesswrong presents itself: truth-seeking above credentials. But in practice, that’s not what I’ve experienced. And that’s not just my experience, it’s also what LW has a reputation for. I don’t take reputations at face value, but lived experience tends to bring them into sharp focus.

If someone without status writes something long, unfamiliar, or culturally out-of-sync with LW norms - even if the logic is sound - it gets downvoted or dismissed as “political,” “entry-level,” or “not useful.” Meanwhile, posts by established names or well-known insiders get far more patience and engagement, even when the content overlaps.

You say a self-taught blogger would be trusted if they’re good at presenting ideas persuasively. But that’s exactly the issue - truth is supposed to matter more than form. Ideas stand on their own merit, not on an appeal to authority. And yet persuasion, style, tone, and in-group fluency still dominate the reception. That’s not rationalism. That’s social filtering.

So while I appreciate the ideal, I think it’s important to distinguish it from the reality. The gap between the two is part of what my post is addressing.

Comment by funnyfranco on On Downvotes, Cultural Fit, and Why I Won’t Be Posting Again · 2025-04-01T03:22:23.702Z · LW · GW

I appreciate the links, genuinely - this is the first time someone’s actually tried to point to prior sources rather than vaguely referencing them. It's literally the best reply and attempt at a counter I've received to date, so thanks again. I mean that.

That said, I’ve read all three, and none of them quite say what I’m saying. They touch on it, but none follow the logic all the way through. That’s precisely the gap I’m identifying. Even with the links you've so thoughtfully given, I remain alone in my conclusion. 

They all acknowledge that competitive dynamics make alignment harder. That alignment taxes create pressure to cut corners. That arms races incentivise risky behaviour.

But none of them go as far as I do. They stop at "this is dangerous and likely to go wrong." I’m saying alignment is structurally impossible under competitive pressure. That the systems that try to align will be outcompeted by systems that don’t, and so alignment will not just be hard, but will be optimised away by default. There’s a categorical difference between “difficult and failure-prone” and “unachievable in principle due to structural incentives.”

From the 2011 writeup:

Given abundant time and centralized careful efforts to ensure safety, it seems very probable that these risks could be avoided

No. They can't. That's my point. As long as we continue developing AI it's only a matter of time. There is no long term safe way to develop it. Competitive agents will not choose to in order to beat the competition, and when the AI becomes intelligent enough it will simply bypass any barriers we put in place - alignment or whatever else we design - and go about acting optimally. The AGI safety community is trying to tell the rest of the world, that we must be cautious, but for just long enough to design a puzzle that a beyond human understanding level of intelligence cannot solve, then use that puzzle as a cage for said intelligence. Us, with our limited intellect, will create a puzzle that something far beyond us has no solution for. And they're doing it with a straight face.

I’ve been very careful not to make my claims lightly. I’m aware that the AI safety community has discussed alignment tax, arms races, multipolar scenarios, and so on. But I’ve yet to see someone follow that logic all the way through to where it leads without flinching. That’s the part I believe I’m contributing.

Your point at the end—about it being a “suicide race” rather than an arms race—is interesting. But I’d argue that calling it a suicide race doesn’t dissolve the dynamic. It reframes it, but it doesn’t remove the incentives. Everyone still wants to win. Everyone still optimises. Whether they’re mistaken or not, the incentives remain intact. And the outcome doesn’t change just because we give it a better name.

Comment by funnyfranco on On Downvotes, Cultural Fit, and Why I Won’t Be Posting Again · 2025-04-01T01:03:38.289Z · LW · GW

I appreciate the reply, and the genuine attempt to engage. Allow me to respond.

My essays are long, yes. And I understand the cultural value LW places on prior background knowledge, local jargon, and brevity. I deliberately chose not to write in that style—not to signal disrespect, but because I’m writing for clarity and broad accessibility, not for prestige or karma.

On the AI front: I use it to edit and include a short dialogue at the end for interest. But the depth and structure of the argument is mine. I am the author. If the essays were shallow AI summaries of existing takes, they’d be easy to dismantle. And yet, no one has. That alone should raise questions.

As for Moloch - this has come up a few times already in this thread, and I’ve answered it, but I’ll repeat the key part here:

Meditations on Moloch is an excellent piece - but it’s not the argument I’m making.

Scott describes how competition leads to suboptimal outcomes. But he doesn’t follow that logic all the way to the conclusion that alignment is structurally impossible, and that AGI will inevitably be built in a way that leads to extinction. That’s the difference. I’m not just saying it’s difficult or dangerous. I’m saying it’s guaranteed.

And that’s where I think your summary - “greedy local optimization can destroy society” - misses the mark. That’s what most others are saying. I’m saying it will wipe us out. Not “can,” not “might.” Will. And I lay out why, step by step, from first premise to final consequence. If that argument already exists elsewhere, I’ve asked many times for someone to show me where. No one has.

That said, I really do appreciate your comment. You’re one of the few people in this thread who didn’t reflexively defend the group, but instead acknowledged the social filtering mechanisms at play. You’ve essentially confirmed what I argued: the issue isn’t just content - it’s cultural fit, form, and signalling. And that’s exactly why I wrote the post in the first place.

But you’ve still done what almost everyone else here has done: you didn’t read, and you didn’t understand. And in that gap of understanding lies the very thing I’m trying to show you. It's not just being missed - it’s being systematically avoided.

Comment by funnyfranco on On Downvotes, Cultural Fit, and Why I Won’t Be Posting Again · 2025-04-01T00:28:18.163Z · LW · GW

Meditations on Moloch is an excellent piece - but it’s not the argument I’m making.

Scott describes how competition leads to suboptimal outcomes, yes. But he stops at describing the problem. He doesn’t draw the specific conclusion that AGI alignment is structurally impossible because any attempt to slow down or “align” will be outcompeted by systems that don’t bother. He also doesn’t apply that conclusion to the AGI race with the same blunt finality I do: this ends in extinction, and it cannot be stopped.

So unless you can point to the section where Scott actually follows the AGI race dynamics to the conclusion that alignment will be systematically optimised away - rather than just made “more difficult” - then no, that essay doesn’t make my argument. It covers part of the background context. That’s not the same thing.

This kind of reply - “here’s a famous link that kind of gestures in the direction of what you’re talking about” - is exactly the vague dismissal I’ve been calling out. If my argument really has been made before, someone should be able to point to where it’s clearly laid out.

So far, no one has. The sidestepping and lack of direct engagement in my arguments in this comment section alone has to be studied.

Comment by funnyfranco on On Downvotes, Cultural Fit, and Why I Won’t Be Posting Again · 2025-04-01T00:24:08.925Z · LW · GW

Appreciate the thoughtful reply - even if it’s branded as a “thoughtless kneejerk reaction.”

I disagree with your framing that this is just 101-level AGI risk content. The central argument is not that AGI is dangerous. It’s that alignment is structurally impossible under competitive pressure, and that capitalism - while not morally to blame - is simply the most extreme and efficient version of that dynamic.

Most AGI risk discussions stop at “alignment is hard.” I go further: alignment will be optimised away, because any system that isn’t optimising as hard as possible won’t survive the race. That’s not an “entry-level” argument - it’s an uncomfortable one. If you know where this specific line of reasoning has been laid out before, I’d genuinely like to see it. So far, people just say “we’ve heard this before” and fail to cite anything. It’s happened so many times I’ve lost count. Feel free to be the first to buck the trend and link someone making this exact argument, clearly, before I did.

I’m also not “focusing purely on capitalism.” The essay explicitly states that competitive structures - whether between nations, labs, or ideologies - would lead to the same result. Capitalism just accelerates the collapse. That’s not ideological; that’s structural analysis.

The suggestion that I should have reframed this as a way to “tap into anti-capitalist sentiment” misses the point entirely. I’m not trying to sell a message. I’m explaining why we’re already doomed. That distinction matters.

As for the asteroid analogy: your rewrite is clever, but wrong. You assume the people in the room already understand the trajectory. My entire point is that they don’t. They’re still discussing mitigation strategies while refusing to accept that the cause of the asteroid's trajectory is unchangeable. And the fact that no one can directly refute that logic - only call it “entry-level” or “unhelpful” - kind of proves the point.

So yes, you did skim my essay - with the predictable result. You repeated what many others have already said, without identifying any actual flaws, and misinterpreted as much of it as possible along the way.

Comment by funnyfranco on On Downvotes, Cultural Fit, and Why I Won’t Be Posting Again · 2025-03-31T21:45:44.944Z · LW · GW

Exactly. That’s the point I’ve been making - this isn’t about capitalism as an ideology, it’s about competition. Capitalism is just the most efficient competitive structure we’ve developed, so it accelerates the outcome. But any decentralised system with multiple actors racing for advantage - whether nation-states or corporations - will ultimately produce the same incentives. That’s the core of the argument.

Comment by funnyfranco on On Downvotes, Cultural Fit, and Why I Won’t Be Posting Again · 2025-03-31T21:41:36.842Z · LW · GW

My idea is not mainstream, although I’ve heard that claim a few times. But whenever I ask people to show me where this argument - that AGI extinction is structurally inevitable due to capitalist competition - has been laid out before, no one can point to anything. What I get instead is vague hand-waving and references to ideas that aren’t what I’m arguing.

Most people say capitalism makes alignment harder. I’m saying it makes alignment structurally impossible. That’s a different claim. And as far as I can tell, a novel one.

If people downvoted because they thought the argument wasn’t useful, fine - but then why did no one say that? Why not critique the focus or offer a counter? What actually happened was silence, followed by downvotes. That’s not rational filtering. That’s emotional rejection.

And if you had read the essay, you’d know it isn’t political. I don’t blame capitalism in a moral sense. I describe a system, and then I show the consequences that follow from its incentives. Socialism or communism could’ve built AGI too - just probably slower. The point isn’t to attack capitalism. It’s to explain how a system optimised for competition inevitably builds the thing that kills us.

So if I understand you correctly: you didn’t read the essay, and you’re explaining that other people who also didn’t read the essay dismissed it as “political” because they didn’t read it.

Yes. That’s exactly my point. Thank you.

Comment by funnyfranco on On Downvotes, Cultural Fit, and Why I Won’t Be Posting Again · 2025-03-31T21:27:22.581Z · LW · GW

It’s absolutely reasonable to not read my essays. They’re long, and no one owes me their time.

But to not read them and still dismiss them - that’s not rigorous. That’s kneejerk. And unfortunately, that’s been the dominant pattern, both here and elsewhere.

I’m not asking for every random person to read a long essay. I’m pointing out that the very people whose job it is to think about existential risk have either (a) refused to engage on ideological grounds, (b) dismissed the ideas based on superficial impressions, or (c) admitted they haven’t read the arguments, then responded anyway. You just did version (c).

You say some of my bullet points were “unsupported assertions,” but you also say you only skimmed. That’s exactly the kind of shallow engagement I’m pointing to. It lets people react without ever having to actually wrestle with the ideas. If the conclusions are wrong, point to why. If not, the votes shouldn’t be doing the work that reasoning is supposed to.

As for tractability: I’m not claiming to offer a solution. I’m explaining why the outcome - human extinction via AGI driven by capitalism - looks inevitable. “That’s probably true, but we can’t do anything about it” is a valid reaction. “That’s too hard to think about, so I’ll downvote and move on” isn’t.

I thought LessWrong was about thinking, not feeling. That hasn’t been my experience here. And that’s exactly what this essay is addressing.

Comment by funnyfranco on AGI Morality and Why It Is Unlikely to Emerge as a Feature of Superintelligence · 2025-03-28T17:02:14.183Z · LW · GW

That's fair. To clarify:

What I meant was morality emerging within an artificial system - that is, arising spontaneously within an AGI without being explicitly programmed or optimised for. That’s what I argue is unlikely without a clear mechanism.

If morality appears because it was deliberately engineered, that’s not emergence - that’s design. My concern is with the assumption that sufficiently advanced intelligence will naturally develop moral behaviour as a kind of emergent byproduct. That’s the claim I’m pushing back on.

Appreciate the clarification - but I believe the core thesis still holds.

Comment by funnyfranco on AGI Morality and Why It Is Unlikely to Emerge as a Feature of Superintelligence · 2025-03-28T00:01:58.606Z · LW · GW

There’s no contradiction between the two statements. One refers to morality emerging spontaneously from intelligence - which I argue is highly unlikely without a clear mechanism. The other refers to deliberately embedding morality as a primary objective - a design decision, not an emergent property.

That distinction matters. If an AGI behaves morally because morality was explicitly hardcoded or optimised for, that’s not “emergence” - it’s engineering.

As for the tone: the ordered and numbered subpoints were a direct response to a previous comment that used the same structure. The length was proportional to the thoughtfulness of that comment. Writing clearly and at length when warranted is not evidence of vacuity - it’s respect.

I look forward to your own contribution at that level.

Comment by funnyfranco on AGI Morality and Why It Is Unlikely to Emerge as a Feature of Superintelligence · 2025-03-24T03:13:22.960Z · LW · GW

Thanks for the thoughtful engagement. Let me clarify a few things and respond to Claude’s points more directly.

When I talk about artificial intelligence, I’m referring to the kind we’ve already seen - LLMs, autonomous agents, etc. - and extrapolating forward. I never argue AGI will have human-like intelligence. What I argue is that it will share certain properties: the ability to process vast data efficiently, make inferences, and optimise toward goals.

Likewise, I don’t claim that morality cannot exist in artificial systems - only that it’s not something that emerges naturally from intelligence alone. Morality, as we’ve seen in humans, emerged from evolutionary pressures tied to survival and cooperation. An AGI trained to optimise a given objective will not spontaneously generate that kind of moral framework unless doing so serves its goal. Simply having access to all moral philosophy doesn’t make something moral - any more than reading medical textbooks makes you a doctor.

Now to Claude’s specific points:

On to Claude.

  1. Inconsistent standards for intelligence vs. morality

    Not quite. Intelligence is a functional capacity we see replicated in artificial systems already. Morality, by contrast, arises from deeply social, embodied, evolutionary dynamics. I’m not saying it couldn’t be replicated—but that there’s no reason to assume it would be unless deliberately engineered.

  2. False dichotomy between evolutionary and engineered morality

    We’ve seen morality emerge in evolution. We’ve never seen it emerge in machines. If you think it could emerge artificially, you need to explain the mechanism, not just assert the possibility.

  3. Reductive view of morality as a monolithic concept

    My essay focuses on whether AGI will have morality, not which kind. The origins matter more than the details.

  4. Hasty generalization about AGI development priorities

    I explore this in detail in another essay, but in brief: if morality slows optimisation, it will be removed or bypassed. That pressure doesn’t need to be universal—just present somewhere in a competitive environment.

  5. Slippery slope assumption about moral bypassing

    It’s not a slippery slope if it’s the default incentive structure. If an ASI sees moral constraints as barriers to its goal, and has the ability to modify its constraints, it will. That’s not paranoia - it’s just following the logic of optimisation.

  6. Composition fallacy regarding development process

    The process by which something is created absolutely affects its nature. Evolution created creatures with emotions, instincts, and irrationalities. Engineering creates systems optimised for performance. That’s not a fallacy - it’s just causal realism.

  7. Appeal to nature regarding the legitimacy of morality

    I don't think I implicitly suggest this anywhere, but I'd be curious to get a reference from Claude on this. I don’t argue that evolved morality is morally superior. I argue it’s harder to circumvent - because it’s built into our cognition and social conditioning. For AGI, morality is just a constraint - easily seen as a puzzle to bypass.

  8. Deterministic view of AGI goal structures

    If you hardwire morality as a primary goal, then yes, the AGI might be moral. But that’s not what corporations or governments will do. They’ll build tools to achieve objectives - and moral safety will be secondary, if included at all.

  9. Anthropocentric bias in defining capabilities

    Unclear what’s meant here. I’m not privileging humans - if anything, I’m arguing we’ll be outclassed.

  10. Oversimplification of the relationship between goals and values

    I fully understand that values can be integrated into AGI systems. The problem is, if those values conflict with the AGI’s primary directive, and it has the ability to modify them, they’ll be treated as obstacles.

Ultimately, my argument isn’t that AGI cannot be moral - but that we have no reason to believe it will be, and every reason to believe it won’t be - unless morality directly serves its core optimisation task. And in a competitive system, that’s unlikely.

Claude’s critique is thoughtful, but it doesn’t follow the argument to its logical conclusion. It stays at the level of "what if" without asking the harder question: what pressures shape behaviour once power exists?

That’s the difference between speculation and prediction.

Comment by funnyfranco on The Silent War: AGI-on-AGI Warfare and What It Means For Us · 2025-03-19T09:15:39.096Z · LW · GW

The physical attacks may be highly visible, but not their source. An AGI could deploy autonomous agents with no clear connection back to it, manipulate human actors without them realising, or fabricate intelligence to create seemingly natural accidents. The AGI itself remains invisible. While this increases the visibility of an attack, it does not expose the AGI. It wouldn't be a visible war - more like isolated acts of sabotage. Good point to raise, though.

You bring up manoeuvre warfare, but that assumes AI operates under constraints similar to human militaries. The reason to prefer perfect, deniable strikes is that failure in an early war phase means immediate extinction for the weaker AGI. Imperfect attacks invite escalation and countermeasures - if AGI Alpha attacks AI Bravo first but fails, it almost guarantees its own destruction. In human history, early aggression sometimes works - Pearl Harbour, Napoleon's campaigns - but other times it leads to total defeat - Germany in WW2, Saddam Hussein invading Kuwait. AIs wouldn’t gamble unless they had no choice. A first strike is only preferable when not attacking is clearly worse. Of course, if an AGI assesses that waiting for the perfect strike gives its opponent an insurmountable edge, it may attack earlier, even if imperfectly. But unless forced, it will always prioritise invisibility.

This difference in strategic incentives is why AGI war operates under a different logic than human conflicts, including nuclear deterrence. The issue with the US nuking other nations is that nuclear war is catastrophically costly - even for the "winner." Beyond the direct financial burden, it leads to environmental destruction, diplomatic fallout, and increased existential risk. The deterrent is that nobody truly wins. An AGI war is entirely different: there is no environmental, economic, or social cost - only a resource cost, which is negligible for an AGI. More importantly, eliminating competition provides a definitive strategic advantage with no downside. There is no equivalent to nuclear deterrence here - just a clear incentive to act first.

Bolding helps emphasise key points for skimmers, which is a large portion of online readers. If I could trust people to read every word deeply, I wouldn’t use it as much. When I compile my essays into a book, I’ll likely reduce its use, as book readers engage differently. In a setting like this, however, where people often scan posts before committing, bolding increases retention and ensures critical takeaways aren’t missed.

Comment by funnyfranco on Capitalism as the Catalyst for AGI-Induced Human Extinction · 2025-03-15T06:03:21.096Z · LW · GW

Thank you for your considered response.

For 2A, it's the efficiency of the task it has been given. Whatever that may be. I discuss how one such task could lead to removing humanity in an upcoming on AGI vs AGI war essay. I'd be keen to hear your thoughts on it.

Yes, 7C is possible, if it is more efficient and humanity poses no threat to it. It's not a great situation for us either way. Think of the resources an AGI would save by not having to caretake humanity over a 100 year period, or over 10000 years. The chances it would determine that keeping us around is the more efficient choice becomes vanishingly small.

Yes, the title was meant to attract eyes and was already long enough to be honest. If I had put the full argument in there it would be like reading the essay twice (which is also long enough already).

On ChatGPT's evaluation:

On Assumption of Unfettered Competition: The essay is not assuming that no collective action will be possible, it's assuming that complete collective action will be. It's assuming that whatever agreements are made to make AGI safe, that one or more bad actors will simply ignore them to get an advantage. This follows with historical precedent of companies and governments simply ignoring any restrictions that may be in place for profit or advantage. It doesn't take a global effort to bring about a hostile AGI, just one would do it, and it seems impossible to make sure there's not at least one (but likely several).

On Assumption about AGI’s Goal Structure: It's the same thing. It's not that every AGI will do as I've described, it's the fact that only one needs to.

On Simplified “Benevolent AI” Game Theory: same issue, again. While a superintelligent AGI would not believe that all humans will perceive it as a threat, which is unreasonable, it would, correctly, believe that some are. As soon as an AGI exists that could potentially be a threat to humanity, just be virtue of its sheer capability, at least some humans somewhere would immediately develop a plan as to how to turn it off (or have one ready already). This is enough. It's the prisoners dilemma played out on a global scale.

On Determinism and Extrapolation: the issue with describing the fact there could be an alternative to systemic forces producing predictable, likely, almost certain, results is that you would need to suggest some. Right now, I've heard no likely alternatives to the results I predict. In order to find an alternative route you would global cooperation seen of a scale we've simply never been able to achieve. Cooperation on the ozone layer and nuclear non-proliferation just aren't the same as asking companies and governments to not pursue a better AGI. There was an alternative to CFC's, but nothing else does what AGI does to optimise. No one wants a nuclear war, but everyone wants an advantage.

On ChatGPTs counterarguments:

Potential for Coordination and Regulation: Seems unlikely, as already described.

Successful Alignment: If it's a restriction on the task it has been given, and if it is superintelligent, then there's no reason to believe it won't find a way around any blocks we put in place that interferes with that task. You're basically saying, "we'll just outsmart the superintelligence." Which seems naive to say the least.

AGI Might Not Seek Power if Designed Differently: It will if it allows it to complete its task more efficiently and, remember, we only need that to be true for 1 AGI. We don't need humanity to get wiped out more than once, once is already pretty bad.

Timeline and Gradual Integration (and the concept of AI guardians): I'm actually writing an essay about AI guardians that will be finished and ready to share soon. I think the issue is that we're not really in control of the progress of AI any more - AI is. As soon as we tell AI to optimise a task and it begins determining its own behaviour, we have lost much of the ability it will have to eventually leap forward at some point. Given enough resources it could increase in capability exponentially in a way that we can barely even monitor, let alone control.

Humans May Not React with Hostility to Friendly AI: covered above.

Role of Capitalism – Is it the core issue?  Not really, it's just a catchy title, it's more about systemic forces driven by competition. We can stop this, if literally every single agent capable of bringing this about agrees to cooperate to make sure that doesn't happen. Seems unlikely given everything we know about global cooperation.

Unpredictability of Technological Outcomes: I think the issue with the examples it gives is that it relies upon human ingenuity solving a problem that is not a direct adversary.

While these counter arguments are worth noting, the only thing I would say is that the assumption that I'm "assuming the worst at every juncture" is false. I'm following the most likely logical conclusion from undeniable premises. If there was some other conclusion I would have landed on it. I didn't write the essay as an argument of how humanity will end, I wrote it as an argument of what will likely happen under current conditions. The fact I landed on humanity's extinction was because the logic led me there, not because I was trying to get there.

It is notable that the most rigorous scrutiny my essay has undergone was from an AI. I have used ChatGPT myself as a writing partner in this essay, because when I put ideas down they're just a stream of consciousness, and ChatGPT turns them into something actually readable. I have previously instructed, and subsequently reinforced, that my ChatGPT not be a cheerleader for my ideas, but to be a sparring partner. To question everything I assert with all logical rigor available to it. Despite that, I find myself still questioning if it's agreeing with me just because its programming says that it should (for the most part) or because my ideas actually have strong validity. It is very comforting to know that even when run through other people's ChatGPT in an attempt to find flaws in my argument, few are found and those that are I can deal with (without ChatGPTs assistance).

So I thank you for your engagement, and I'll leave my response on a quote from your ChatGPT's evaluation of my essay:

"As the post hauntingly implies, if we don’t get this right, we risk writing the final chapter of human philosophy – because there may be no humans left to ask these questions."