LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[question] Are Sparse Autoencoders a good idea for AI control?
Gerard Boxo (gerard-boxo) · 2024-12-26T17:34:55.617Z · answers+comments (2)

[link] When do experts think human-level AI will be created?
Vishakha (vishakha-agrawal) · 2024-12-30T06:20:33.158Z · comments (0)

[link] The Economics & Practicality of Starting Mars Colonization
Zero Contradictions · 2024-12-26T10:56:26.019Z · comments (1)

Algorithmic Asubjective Anthropics, Cartesian Subjective Anthropics
Lorec · 2024-12-27T01:58:39.880Z · comments (0)

2. Skim the Manual: Intelligent Voluntary Cooperation
Allison Duettmann (allison-duettmann) · 2025-01-02T19:02:06.864Z · comments (0)

Teaching Claude to Meditate
Gordon Seidoh Worley (gworley) · 2024-12-29T22:27:44.657Z · comments (3)

I Recommend More Training Rationales
Gianluca Calcagni (gianluca-calcagni) · 2024-12-31T14:06:44.007Z · comments (0)

Duplicate token neurons in the first layer of gpt2-small
Alex Gibson · 2024-12-27T04:21:55.896Z · comments (0)

[link] Riffing on Machines of Loving Grace
an1lam · 2025-01-01T01:06:45.122Z · comments (0)

Towards a Unified Interpretability of Artificial and Biological Neural Networks
jan_bauer · 2024-12-21T23:10:45.842Z · comments (0)

[Rationality Malaysia] 2024 year-end meetup!
Doris Liew (doris-liew) · 2024-12-23T16:02:03.566Z · comments (0)

On False Dichotomies
nullproxy · 2025-01-02T18:54:21.560Z · comments (0)

[link] World models I'm currently building
xpostah · 2024-12-30T08:26:16.972Z · comments (0)

Alienable (not Inalienable) Right to Buy
FlorianH (florian-habermacher) · 2025-01-01T12:19:03.691Z · comments (4)

Game Theory and Behavioral Economics in The Stock Market
Jaiveer Singh (jaiveer-singh) · 2024-12-24T18:15:55.468Z · comments (0)

[question] What are the main arguments against AGI?
Edy Nastase (edy-nastase) · 2024-12-24T15:49:03.196Z · answers+comments (6)

[link] AGI is what generates evolutionarily fit and novel information
onur · 2025-01-01T09:22:55.841Z · comments (0)

ARC-AGI is a genuine AGI test but o3 cheated :(
Knight Lee (Max Lee) · 2024-12-22T00:58:05.447Z · comments (2)

Emergence and Amplification of Survival
jgraves01 · 2024-12-28T23:52:47.893Z · comments (0)

The Great OpenAI Debate: Should It Stay ‘Open’ or Go Private?
Satya (satya-2) · 2024-12-30T01:14:28.329Z · comments (0)

The AI Agent Revolution: Beyond the Hype of 2025
DimaG (di-wally-ga) · 2025-01-02T18:55:22.824Z · comments (0)

Morality Is Still Demanding
utilistrutil · 2024-12-29T00:33:40.471Z · comments (2)

The Opening Salvo: 1. An Ontological Consciousness Metric: Resistance to Behavioral Modification as a Measure of Recursive Awareness
Peterpiper · 2024-12-25T02:29:52.025Z · comments (0)

Making LLMs safer is more intuitive than you think: How Common Sense and Diversity Improve AI Alignment
Jeba Sania (jeba-sania) · 2024-12-29T19:27:35.685Z · comments (0)

[link] Merry Sciencemas: A Rat Solstice Retrospective
leebriskCyrano · 2025-01-01T01:08:36.433Z · comments (0)

Turing-Test-Passing AI implies Aligned AI
Roko · 2024-12-31T19:59:27.917Z · comments (28)

Action: how do you REALLY go about doing?
DDthinker · 2024-12-29T22:00:24.915Z · comments (0)

How Business Solved (?) the Human Alignment Problem
Gianluca Calcagni (gianluca-calcagni) · 2024-12-31T20:39:59.067Z · comments (1)

[link] Human, All Too Human - Superintelligence requires learning things we can’t teach
Ben Turtel (ben-turtel) · 2024-12-26T16:26:27.328Z · comments (4)

Aristotle, Aquinas, and the Evolution of Teleology: From Purpose to Meaning.
Spiritus Dei (spiritus-dei) · 2024-12-23T19:37:58.788Z · comments (0)

Woloch & Wosatan
JackOfAllTrades (JackOfAllSpades) · 2024-12-22T15:46:27.235Z · comments (0)

Terminal goal vs Intelligence
Donatas Lučiūnas (donatas-luciunas) · 2024-12-26T08:10:42.144Z · comments (24)

Propaganda Is Everywhere—LLM Models Are No Exception
Yanling Guo (yanling-guo) · 2024-12-23T01:39:03.777Z · comments (0)

The Engineering Argument Fallacy: Why Technological Success Doesn't Validate Physics
Wenitte Apiou (wenitte-apiou) · 2024-12-28T00:49:53.300Z · comments (5)

Rejecting Anthropomorphic Bias: Addressing Fears of AGI and Transformation
Gedankensprünge (gedankenspruenge) · 2024-12-29T01:48:47.583Z · comments (1)

AI Alignment, and where we stand.
afeller08 · 2024-12-29T14:08:47.276Z · comments (0)

So you want to be a witch
lucid_levi_ackerman · 2024-12-31T04:31:52.196Z · comments (3)

The Misconception of AGI as an Existential Threat: A Reassessment
Gedankensprünge (gedankenspruenge) · 2024-12-29T01:39:57.780Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

alexander-gietelink-oldenziel on quetzal_rainbow's Shortform

Not a biologist but my impression is that a lot of progress in biology came from refining and validating existing techniques. Also building up a large library of biological specimens & phenomena, i.e. taxonomy. The esthetic and practice of MechInterp seems in accord with that.

alexander-gietelink-oldenziel on Beyond Kolmogorov and Shannon

Yes, you make a good point. This post was written when our (my) understanding was less-developed. You might be interested in taking a look at Kolmogorov Algorithmic Sufficient Statistics for a more sophisticated framework that make the noise-structure decomposition in a principled way.

benito on Benito's Shortform Feed

It seems worth mentioning that punishments for financial crime often include measures like "person gets banned from their industry" or them getting banned from participating in all kinds of financial schemes... But in theory, I like the idea of adding things to the sentencing that make re-offending less likely. This way, you can maybe justify giving people second chances.

Good point. I can imagine things like "permanent parole" (note that probation and parole are meaningfully different) or being under house arrest or having constraints on your professional responsibilities or finances or something, being far better than literal incarceration.

benito on Benito's Shortform Feed

I agree there are people who do small amounts of damage to society, are caught, and do not reoffend. Then there are other people whose criminal activities will be most of their effect on society, will reliably reoffend, and for whom the incapacitation strongly works out positive in consequentialist terms. My aim would be to have some way of distinguishing between them.

The amount of evidence we have about Bankman-Fried's character is quite different than that of most con men, including from childhood and from his personal diary, so I hope we can have more confidence based on that. But a different solution is to not do any psychologizing, and just judge based on reoffending. See this section from the ACX post:

In 2001, the Dutch government passed a law allowing longer sentences for criminals with at least ten previous offense who were not good targets for rehabilitation (eg rejected or had already failed drug treatment). The law allowed judges to increase the typical sentence for petty theft (2 months) to a longer sentence (2 years). A quasi-experimental study found that property crime, though not violent crime, decreased by 25%. It’s not surprising that violent crime didn’t go down since the law was almost entirely deployed against thieves.
Vollaard found that the population affected was extremely criminal; they had an average of 31 past offenses, and on surveys they admitted to committing an average of 256 crimes per year (mostly shoplifting). Before the law was passed, they spent an average of four months per year in jail (probably 2 x 2 month sentences); afterwards, they spent two years in jail per crime.

I should add that Scott has lots of concerns about doing this in the US, and argues that properly doing this in the US would massively increase the incarcerated population. I didn't quite follow his concerns, but I was not convinced that something like this would be a bad idea on consequentialist grounds, even if the incarcerated population were to massively increase. (Note that I would support improving the quality of prisons to being broadly as nice as outside of prisons.)

satron on Alignment Is Not All You Need

I will try to write down my thoughts on these problems below:

1) The Coordination Problem

For any organization developing AI, failing to align it is just as dangerous—if not more so—than losing the AI race altogether. If an organization has already secured the resources needed to win the capabilities race and has a functioning alignment solution (two of the most challenging hurdles), I'd be confident that it can successfully implement that solution (which, in comparison, seems like the easiest part). The risks of failing to implement alignment solutions are essentially the same as the risks of not having an alignment solution in the first place:

If you don't have a working alignment solution, you die.

If you fail to implement to implement a working alignment solution, you die.

Companies spending considerable resources on creating a working solution to the alignment problem will have all the same reasons for actually implementing it.

2) The Power Distribution Problem

I wouldn't necessarily frame this as a problem. Consider a world where multiple entities control AI—this scenario appears quite a bit more problematic. As it stands, the US is seemingly at the forefront of the AI race. Do we really want China and Russia to develop their own AIs? Even more troubling is the idea of multiple individuals owning superhuman AI. Just one person bent on global vengeance could lead to catastrophic outcomes. I'd be much more inclined to trust the AI race's winner to act in humanity's best interest than to rely on the goodness of every individual AI owner (including the winner of the AI race).

If the winner of the AI race will not act in humanity's best interests, then we won't have the means to make him share AI with others.

If the winner of the AI race will act in humanity's best interests, then we won't want him to share AI with other agents who might not act in humanity's best interests.

3) The Economic Transition Problem

If AI is aligned with human values, there is no need for humans to retain economic control. AI would simply leverage our economic resources for the benefit of humanity.

jasoncrawford on Biological risk from the mirror world

Yes, they would not be made from mirror components!

spencer-ericson on Turing-Test-Passing AI implies Aligned AI

If the clone behaves indistinguishably from the human it is based on, then there is simply nothing more to say. It doesn't matter what is going on inside.

Right, I agree on that. The problem is, "behaves indistinguishably" for how long? You can't guarantee whether it will stop acting that way in the future, which is what is predicted by deceptive alignment.

peter-berggren on Bug Hunt 2

My greatest ambition is to create a fully trainable art of rationality that’s so good it gets taught to every high schooler in the country and bankrupts multiple industries that prey on irrational behavior in the process. Although it may seem impossible, the success of anti-smoking efforts against an extremely addictive product with a massive advertising industry suggests that it's achievable, and the fact that the Internet exists now and didn't exist then suggests it's even easier than that was.

lc on Benito's Shortform Feed

Suppose Sam Bankman-Fried is imprisoned for 25 years. After that time, he will be a decent, law-abiding member of society, who is safe to release from prison.

I voted 75% because taken literally I think in 25 years AI will be so advanced that he won't have much of an ability to impact the world at all 🤓

(Otherwise 40%)

habryka4 on 2024 Unofficial LessWrong Census/Survey

Lol, I am sorry about the fundraising email. It was really quite embarrassing.

(Context, a recent fundraising email I sent out to a bunch of old LessWrong accounts had unsubscribe links that pointed to localhost:3000 instead of lesswrong.com, which of course is the most important link not to break)