LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Signaling with Small Orange Diamonds
jefftk (jkaufman) · 2024-11-07T20:20:08.026Z · comments (1)

[link] Understanding Gödel’s completeness theorem
jessicata (jessica.liu.taylor) · 2024-05-27T18:55:02.079Z · comments (0)

[link] Book review: On the Edge
PeterMcCluskey · 2024-08-30T22:18:39.581Z · comments (0)

Litigate-for-Impact: Preparing Legal Action against an AGI Frontier Lab Leader
Sonia Joseph (redhat) · 2024-12-07T21:42:29.038Z · comments (7)

[link] The Way According To Zvi
Sable · 2024-12-07T17:35:48.769Z · comments (3)

AI Safety Camp 10
Robert Kralisch (nonmali-1) · 2024-10-26T11:08:09.887Z · comments (9)

(Appetitive, Consummatory) ≈ (RL, reflex)
Steven Byrnes (steve2152) · 2024-06-15T15:57:39.533Z · comments (1)

[link] The consistent guessing problem is easier than the halting problem
jessicata (jessica.liu.taylor) · 2024-05-20T04:02:03.865Z · comments (5)

Winning isn't enough
JesseClifton · 2024-11-05T11:37:39.486Z · comments (18)

Is This Lie Detector Really Just a Lie Detector? An Investigation of LLM Probe Specificity.
Josh Levy (josh-levy) · 2024-06-04T15:45:54.399Z · comments (0)

Free Will and Dodging Anvils: AIXI Off-Policy
Cole Wyeth (Amyr) · 2024-08-29T22:42:24.485Z · comments (12)

[question] How would you navigate a severe financial emergency with no help or resources?
Tigerlily · 2024-05-02T18:27:51.329Z · answers+comments (22)

Orca communication project - seeking feedback (and collaborators)
Towards_Keeperhood (Simon Skade) · 2024-12-03T17:29:40.802Z · comments (16)

Glitch Token Catalog - (Almost) a Full Clear
Lao Mein (derpherpize) · 2024-09-21T12:22:16.403Z · comments (3)

Drug development costs can range over two orders of magnitude
rossry · 2024-11-03T23:13:17.685Z · comments (0)

[link] Generative ML in chemistry is bottlenecked by synthesis
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-16T16:31:34.801Z · comments (2)

You can just wear a suit
lsusr · 2025-02-26T14:57:57.260Z · comments (4)

[link] [Fiction] A Confession
Arjun Panickssery (arjun-panickssery) · 2024-04-18T16:28:48.194Z · comments (2)

Inducing Unprompted Misalignment in LLMs
Sam Svenningsen (sven) · 2024-04-19T20:00:58.067Z · comments (7)

[link] An X-Ray is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation
hugofry · 2024-10-07T08:53:14.658Z · comments (1)

[link] Simple Kelly betting in prediction markets
jessicata (jessica.liu.taylor) · 2024-03-06T18:59:18.243Z · comments (3)

Mech Interp Lacks Good Paradigms
Daniel Tan (dtch1997) · 2024-07-16T15:47:32.171Z · comments (0)

[link] Tinker
Richard_Ngo (ricraz) · 2024-04-16T18:26:38.679Z · comments (0)

AI #70: A Beautiful Sonnet
Zvi · 2024-06-27T14:40:08.087Z · comments (0)

Losing Faith In Contrarianism
omnizoid · 2024-04-25T20:53:34.842Z · comments (44)

[link] AISafety.info: What is the "natural abstractions hypothesis"?
Algon · 2024-10-05T12:31:14.195Z · comments (2)

[question] What progress have we made on automated auditing?
LawrenceC (LawChan) · 2024-07-06T01:49:43.714Z · answers+comments (1)

Dialogue on What It Means For Something to Have A Function/Purpose
johnswentworth · 2024-07-15T16:28:56.609Z · comments (5)

Are we so good to simulate?
KatjaGrace · 2024-03-04T05:20:03.535Z · comments (24)

0.202 Bits of Evidence In Favor of Futarchy
niplav · 2024-09-29T21:57:59.896Z · comments (0)

[link] On what research policymakers actually need
MondSemmel · 2024-04-23T19:50:12.833Z · comments (0)

Index of rationalist groups in the Bay Area July 2024
Lucie Philippon (lucie-philippon) · 2024-07-26T16:32:25.337Z · comments (14)

LASR Labs Spring 2025 applications are open!
Erin Robertson · 2024-10-04T13:44:20.524Z · comments (0)

D&D.Sci: Whom Shall You Call?
abstractapplic · 2024-07-05T20:53:37.010Z · comments (6)

Mud and Despair (Part 4 of "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-07T00:14:23.975Z · comments (0)

[link] Elon files grave charges against OpenAI
mako yass (MakoYass) · 2024-03-01T17:42:13.963Z · comments (10)

Book Review: On the Edge: The Business
Zvi · 2024-09-25T12:20:06.230Z · comments (0)

LLMs as a Planning Overhang
Larks · 2024-07-14T02:54:14.295Z · comments (8)

Evaluating Sparse Autoencoders with Board Game Models
Adam Karvonen (karvonenadam) · 2024-08-02T19:50:21.525Z · comments (1)

OODA your OODA Loop
Raemon · 2024-10-11T00:50:48.119Z · comments (3)

Why modelling multi-objective homeostasis is essential for AI alignment (and how it helps with AI safety as well)
Roland Pihlakas (roland-pihlakas) · 2025-01-12T03:37:59.692Z · comments (7)

Six Small Cohabitive Games
Screwtape · 2025-01-15T21:59:29.778Z · comments (7)

The Laws of Large Numbers
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-04T11:54:16.967Z · comments (11)

[link] Scaling Wargaming for Global Catastrophic Risks with AI
rai (nonveumann) · 2025-01-18T15:10:39.696Z · comments (2)

Resolving von Neumann-Morgenstern Inconsistent Preferences
niplav · 2024-10-22T11:45:20.915Z · comments (5)

Lecture Series on Tiling Agents
abramdemski · 2025-01-14T21:34:03.907Z · comments (14)

How accurate was my "Altered Traits" book review?
lsusr · 2025-02-18T17:00:55.584Z · comments (3)

Self-dialogue: Do behaviorist rewards make scheming AGIs?
Steven Byrnes (steve2152) · 2025-02-13T18:39:37.770Z · comments (0)

Building Big Science from the Bottom-Up: A Fractal Approach to AI Safety
Lauren Greenspan (LaurenGreenspan) · 2025-01-07T03:08:51.447Z · comments (2)

The quantum red pill or: They lied to you, we live in the (density) matrix
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-17T13:58:16.186Z · comments (34)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

abhinav-pola on Nathan Helm-Burger's Shortform

Courtesy of Claude Code ;)

https://github.com/abhinavpola/crosstalk

cam-tice on Fuzzing LLMs sometimes makes them reveal their secrets

Great to see it come full circle. For the sake of nostalgia, here's [LW(p) · GW(p)] the original thread that jump started the project.

crispy on We Can Build Compassionate AI

1. **Religious texts and violence**: While Abrahamic texts do contain violent passages, characterizing the "overwhelming majority" as "justifications for genocide and ethnic supremacy" is factually incorrect. These texts contain diverse content including ethical teachings, poetry, historical narratives, and legal codes. The violent passages represent a minority of the content.

The overwhelming majority of the praise hymns, poetry, and historical narratives are praising and expressing gratitude for genocide, ethnic cleansing, colonialism, and violence and the deity they claim caused those things to happen on their behalf. The ethical teachings are ethnic supremacy practices that are reserved solely for use by the practitioners.

2. **"2,000 years of the worst violence in history"**: This statement ignores that violence has existed in all human societies regardless of religion. It also overlooks that many historical atrocities were driven by non-religious ideologies (e.g., 20th century totalitarian regimes).

It doesn’t ignore them at all, it categorizes them as they worst. Based on loss of life and economic costs the religious warfare and religious expansionism of the last 2,000 years has no precedent in history.

To ignore the religious basis for 20th century totalitarian regimes is to ignore history. The expansion of Christianism into the “godless Soviet Union” was a key element in Lebensraum and the ethnic persecutions of the NSDAP were entirely based in Christian doctrine. With “Gott mit uns” (God with us) on their belt buckles, Bibles that portrayed Jesus as an anti-Jewish warrior in their pockets, banners that read “Hitler’s fight and Luther’s teaching are the best defense for the German people”, they marched to a war that still defines the world today. They felt really good about it because, like Manifest Destiny, it was undertaken with religious authority. As everyone’s least favorite Austrian corporal said "We tolerate no one in our ranks who attacks the ideas of Christianity. Our movement is Christian.".

3. **Religious monopoly on compassion**: While some religious groups do claim exclusive moral authority, many traditions explicitly teach universal compassion that extends beyond group boundaries. The comment oversimplifies complex theological positions across diverse traditions.

They teach universal compassion under their own banner. The era of colonialism was viewed as compassionate. Forced birth is viewed as compassionate. Abrahamic religions view soteriological concerns compassion as outweighing physical compassion. Canada and Australia are still grappling with the compassionate religious programs that separated families. The U.S. has yet to address their own history of kidnapping and killing native children to advance their religious religious.

4. **Platonic origins claim**: The assertion that Abrahamic religions derived their concepts of compassion and empathy primarily from Plato is historically questionable. While Hellenistic philosophy influenced later Jewish and Christian thought, these traditions also drew from their own cultural and textual sources that pre-dated significant Greek influence.

The Platonic concept of the imperishable soul as the basis for the fragmentation of Second Temple Judaism into the sects of the Saducees and Pharisees isn’t questioned by scholars. It marks the introduction of the “post-kleos society” into Canaan where (like in India and Greece) glory was previously obtainable through acts of tremendous bloodshed or martyrdom. The imperishable soul, and its continued existence in an afterlife, gave compassion and empathy persistent value that transferred to that afterlife both with the giver and recipient. For the first time, compassion had salvific value. Christianism took it even further and gamified compassion and empathy through their quantification. In both cases compassion and empathy resulted in direct reward in Plato’s afterlife. Prior to the introduction of Platonic concepts of compassion and empathy, the closest thing Judaism had was helping other Jews meet their religious obligations. A good introductory read on the topic is “Heaven and Hell: A History of the Afterlife” by Bart Ehrman. It’s written in an approachable style and doesn’t require a lot of background on the subject.

5. **"Universal religion"**: This term is never clearly defined, making many of the claims difficult to evaluate precisely.

Agreed.

nathan-helm-burger on Campbell Hutcheson's Shortform

I mean, suicide seems much more likely to me given the circumstances... but I also would describe this as compelling evidence. Like, if he had been killed and there wasn't a fight, him being drunk makes sense as a way to have pre-rendered him helpless by someone planning to kill him? Similarly, wouldn't a cold-blooded killer be expected to be wearing gloves and to place Suchir's hand on the gun before shooting him?

nathan-helm-burger on Fuzzing LLMs sometimes makes them reveal their secrets

Nice to see my team's work (Tice 2024) getting used!

david-gross on Matthew Yglesias - Misinformation Mostly Confuses Your Own Side

For what it's worth, here's an excerpt from my book on historical tax resistance campaigns that makes a similar point:

Radical honesty means abjuring subterfuge—conducting your campaign in the open, in plain sight, without trying to take your opponent by surprise through trickery, and without trying to influence people by “spin” and lopsided propaganda. It also means studiously refusing to participate in the dishonesty by which your opponent holds on to power and deceives those who submit to it. Radical honesty has several potential advantages:

1. Honesty provides a stark moral contrast between your campaign and whatever institution you are opposing.

In The Story of Bardoli, Mahadev Desai described how this played out in the Bardoli tax strike:

…a regular propaganda of mendacity was resorted to [by the Government]. The Government’s way and the people’s way presented a striking study in contrasts. On one side there were secrecy, underhand dealings, falsehood, even sharp practice; on the other there were straight and manly speech, and straight action in broad daylight.

This contrast can make your campaign more appealing to potential resisters and to bystanders, and can increase the morale of the resisters in your campaign.

2. Honesty itself is a threat to tyranny.

The way people signal their loyalty to tyranny is to participate in the lies that bolster its power. When everyone around you goes along with the lies, it feels like everyone is loyal to the tyrant. Czech dissident Václav Havel wrote of how this worked under communist tyranny:

Individuals need not believe all these mystifications, but they must behave as though they did, or they must at least tolerate them in silence, or get along well with those who work with them. For this reason, however, they must live within a lie. They need not accept the lie. It is enough for them to have accepted their life with it and in it. For by this very fact, individuals confirm the system, fulfill the system, make the system, are the system.

But, he said, people may start to refuse:

Living within the lie can constitute the system only if it is universal. The principle must embrace and permeate everything. There are no terms whatsoever on which it can coexist with living within the truth, and therefore everyone who steps out of line denies it in principle and threatens it in its entirety.

Tolstoy went further, and claimed that radical honesty not only threatens tyrants but constitutes a revolution:

No feats of heroism are needed to achieve the greatest and most important changes in the existence of humanity; neither the armament of millions of soldiers, nor the construction of new roads and machines, nor the arrangement of exhibitions, nor the organization of workmen’s unions, nor revolutions, nor barricades, nor explosions, nor the perfection of aërial navigation; but a change in public opinion.
And to accomplish this change no exertions of the mind are needed, nor the refutation of anything in existence, nor the invention of any extraordinary novelty; it is only needful that we should not succumb to the erroneous, already defunct, public opinion of the past, which governments have induced artificially; it is only needful that each individual should say what he really feels or thinks, or at least that he should not say what he does not think.
And if only a small body of the people were to do so at once, of their own accord, outworn public opinion would fall off us of itself, and a new, living, real opinion would assert itself. And when public opinion should thus have changed without the slightest effort, the internal condition of men’s lives which so torments them would change likewise of its own accord.
One is ashamed to say how little is needed for all men to be delivered from those calamities which now oppress them; it is only needful not to lie.

3. Honesty keeps your campaign from deluding itself.

In a tax resistance campaign, as in any activist campaign, there are frequently temptations to take short-cuts. Rather than winning a victory after a tough and uncertain struggle, you can declare victory early and hope to capitalize on the resulting morale boost. Or, rather than doing something practical that takes a lot of thankless hours, you can do something quick and symbolic that “makes a powerful statement.” Or, rather than fighting for goals that are worth achieving, you can pick goals that are easily achievable but that aren’t really worth fighting for.

Radical honesty gets you in the habit of avoiding temptations like these. By facing your situation forthrightly, and by evaluating your tactics unflinchingly and without self-flattery, you become more apt to make effective decisions.

4. Honesty is itself a good thing worth contributing to.

If you conduct your campaign in a radically honest way, you contribute to a cultural atmosphere of trust and straightforward communication. In this way, even if you do not succeed in the other goals of your tax resistance campaign, you still may have some residual positive effect on the world around you.

5. Honesty means there’s a lot you no longer have to worry about.

When you practice radical honesty, you don’t have to worry about keeping your stories straight, you don’t have to worry about leaks of information that might cast doubt on your credibility, you don’t have to be as concerned about information security, and you don’t have to worry about spies and informers in your midst who might blab your secrets to the authorities. This leaves you free to spend your energy and attention playing offense instead of defense.

When Gandhi heard concerns that government agents had infiltrated the Indian independence movement, he wrote:

This desire for secrecy has bred cowardice amongst us and has made us dissemble our speech. The best and the quickest way of getting rid of this corroding and degrading Secret Service is for us to make a final effort to think everything aloud, have no privileged conversation with any soul on earth and to cease to fear the spy. We must ignore his presence and treat everyone as a friend entitled to know all our thoughts and plans. I know that I have achieved most satisfactory results from evolving the boldest of my plans in broad daylight. I have never lost a minute’s peace for having detectives by my side. The public may not know that I have been shadowed throughout my stay in India. That has not only not worried me but I have even taken friendly services from these gentlemen: many have apologized for having to shadow me. As a rule, what I have spoken in their presence has already been published to the world. The result is that now I do not even notice the presence of these men and I do not know that the Government is much the wiser for having watched my movements through its secret agency.

What’s the catch? For one thing, for a campaign to be radically honest it needs to have fairly tight control over its message. Not just anyone can be a spokesperson, but only those with the talent to speak precisely and to cut through the sorts of baloney that characterize political debate in this era of spin doctors and pundits and talking points.

Another difficulty is that if your campaign already has a credibility problem, it’s going to take a lot of radical honesty to dig you out of that hole.

Also, it seems that at least some of the benefits of radical honesty only emerge when it has become really radical and pervasive. Half-hearted gestures of radical honesty are just another form of machiavellian communication. If you’re not prepared to go all the way, it may not be to your advantage to put in the extra effort.

gordon-seidoh-worley on We Can Build Compassionate AI

Your claims overgeneralize and it makes them false. To assure I'm not just biased because I am religious myself, here's Claude's take on your comment when I asked it to fact check it.

Prompt: can you fact check this comment that was posted to an online forum (not mine but i suspect it might not be correct but don't trust myself not to be biased so looking for your take) [your comment copied in]

Response:

I'll analyze this forum comment by examining its claims about religion, compassion, and philosophical history.
The comment contains several sweeping generalizations and historical inaccuracies:
1. **Religious texts and violence**: While Abrahamic texts do contain violent passages, characterizing the "overwhelming majority" as "justifications for genocide and ethnic supremacy" is factually incorrect. These texts contain diverse content including ethical teachings, poetry, historical narratives, and legal codes. The violent passages represent a minority of the content.
2. **"2,000 years of the worst violence in history"**: This statement ignores that violence has existed in all human societies regardless of religion. It also overlooks that many historical atrocities were driven by non-religious ideologies (e.g., 20th century totalitarian regimes).
3. **Religious monopoly on compassion**: While some religious groups do claim exclusive moral authority, many traditions explicitly teach universal compassion that extends beyond group boundaries. The comment oversimplifies complex theological positions across diverse traditions.
4. **Platonic origins claim**: The assertion that Abrahamic religions derived their concepts of compassion and empathy primarily from Plato is historically questionable. While Hellenistic philosophy influenced later Jewish and Christian thought, these traditions also drew from their own cultural and textual sources that pre-dated significant Greek influence.
5. **"Universal religion"**: This term is never clearly defined, making many of the claims difficult to evaluate precisely.
The comment does raise legitimate concerns about religious exclusivism and historical misuse of religion to justify violence, but its broad generalizations undermine its credibility as an objective analysis of religion's relationship to compassion and empathy.

Point 5 is obviously an artifact of me failing to give Claude context on what universal religion means, and I didn't define it in the article, but I think it's clear what I mean: religions that see it as their purpose to apply to all people, not just to a single ethnic group or location.

owain_evans on Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

We can be fairly confident the models we created are safe. Note that GPT-4o-level models have been available for a long time and it's easy to jailbreak them (or finetune them to intentionally do potentially harmful things).

brambleboy on Dream, Truth, & Good

I don't think your truth machine would work because you misunderstand what makes LLMs hallucinate. Predicting what a maximum-knowledge author would write induces more hallucinations, not less. For example, say you prompted your LLM to predict text supposedly written by an omniscient oracle, and then asked "How many fingers am I holding behind my back?" The LLM would predict an answer like "three" or something, because an omniscient person would know that, even though it's probably not true.

In other words, you'd want the system to believe "this writer I'm predicting knows exactly what I do, no more, no less", not "this writer knows way more than me". Read Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? for evidence of this.

What would work even better would be for the system to simply be Writing instead of Predicting What Someone Wrote, but nobody's done that yet. (because it's hard)

nathan-helm-burger on lsusr's Shortform

Not always true. Sometimes the locks are 'real' but deliberately chosen to be easy to pick, and the magician practices picking that particular lock. This doesn't change the point much, which is that watching stage magicians is not a good way to get an idea of how hard it is to do X, for basically an value of X. Locking Picking lawyer on youtube is a fun way to learn about locks.