LessWrong 2.0 Reader

View: New · Old · Top

next page (older posts) →

[link] An ML paper on data stealing provides a construction for "gradient hacking"
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2024-07-30T21:44:37.310Z · comments (1)

[link] Open Source Automated Interpretability for Sparse Autoencoder Features
kh4dien · 2024-07-30T21:11:36.866Z · comments (1)

[link] Caterpillars and Philosophy
Zero Contradictions · 2024-07-30T20:54:06.921Z · comments (0)

[link] François Chollet on the limitations of LLMs in reasoning
2PuNCheeZ · 2024-07-30T20:04:12.271Z · comments (1)

[link] Against AI As An Existential Risk
Noah Birnbaum (daniel-birnbaum) · 2024-07-30T19:10:41.156Z · comments (13)

[question] Is objective morality self-defeating?
dialectica (bithov@icloud.com) · 2024-07-30T18:23:06.432Z · answers+comments (3)

Limitations on the Interpretability of Learned Features from Sparse Dictionary Learning
Tom Angsten (tom-angsten) · 2024-07-30T16:36:06.518Z · comments (0)

Self-Other Overlap: A Neglected Approach to AI Alignment
Marc Carauleanu (Marc-Everin Carauleanu) · 2024-07-30T16:22:29.561Z · comments (49)

Investigating the Ability of LLMs to Recognize Their Own Writing
Christopher Ackerman (christopher-ackerman) · 2024-07-30T15:41:44.017Z · comments (0)

Can Generalized Adversarial Testing Enable More Rigorous LLM Safety Evals?
scasper · 2024-07-30T14:57:06.807Z · comments (0)

RTFB: California’s AB 3211
Zvi · 2024-07-30T13:10:03.853Z · comments (2)

If You Can Climb Up, You Can Climb Down
jefftk (jkaufman) · 2024-07-30T00:00:06.295Z · comments (9)

[link] What is Morality?
Zero Contradictions · 2024-07-29T19:19:57.119Z · comments (0)

Arch-anarchism and immortality
Peter lawless · 2024-07-29T18:10:59.270Z · comments (1)

[link] AI Safety Newsletter #39: Implications of a Trump Administration for AI Policy Plus, Safety Engineering
Corin Katzke (corin-katzke) · 2024-07-29T17:50:52.454Z · comments (1)

[link] New Blog Post Against AI Doom
Noah Birnbaum (daniel-birnbaum) · 2024-07-29T17:21:29.633Z · comments (5)

An Interpretability Illusion from Population Statistics in Causal Analysis
Daniel Tan (dtch1997) · 2024-07-29T14:50:19.497Z · comments (3)

[question] How tokenization influences prompting?
Boris Kashirin (boris-kashirin) · 2024-07-29T10:28:25.056Z · answers+comments (4)

Understanding Positional Features in Layer 0 SAEs
bilalchughtai (beelal) · 2024-07-29T09:36:40.701Z · comments (0)

Prediction Markets Explained
Benjamin_Sturisky · 2024-07-29T08:02:40.943Z · comments (0)

San Francisco ACX Meetup “First Saturday”
Nate Sternberg (nate-sternberg) · 2024-07-29T06:11:01.165Z · comments (2)

Relativity Theory for What the Future 'You' Is and Isn't
FlorianH (florian-habermacher) · 2024-07-29T02:01:17.736Z · comments (49)

Wittgenstein and Word2vec: Capturing Relational Meaning in Language and Thought
cleanwhiteroom · 2024-07-28T19:55:17.247Z · comments (2)

Making Beliefs Pay Rent
Screwtape · 2024-07-28T17:59:52.101Z · comments (2)

This is already your second chance
Malmesbury (Elmer of Malmesbury) · 2024-07-28T17:13:57.680Z · comments (13)

[question] Has Eliezer publicly and satisfactorily responded to attempted rebuttals of the analogy to evolution?
kaler · 2024-07-28T12:23:40.671Z · answers+comments (14)

[link] Family and Society
Zero Contradictions · 2024-07-28T07:05:55.899Z · comments (0)

[question] What is AI Safety’s line of retreat?
Remmelt (remmelt-ellen) · 2024-07-28T05:43:05.021Z · answers+comments (12)

AXRP Episode 34 - AI Evaluations with Beth Barnes
DanielFilan · 2024-07-28T03:30:07.192Z · comments (0)

Rats, Back a Candidate
Blake (blake-1) · 2024-07-28T03:19:14.217Z · comments (19)

[link] AI existential risk probabilities are too unreliable to inform policy
Oleg Trott (oleg-trott) · 2024-07-28T00:59:59.497Z · comments (5)

[link] Idle Speculations on Pipeline Parallelism
DaemonicSigil · 2024-07-27T22:40:12.543Z · comments (0)

[link] Re: Anthropic's suggested SB-1047 amendments
RobertM (T3t) · 2024-07-27T22:32:39.447Z · comments (13)

[link] The problem with psychology is that it has no theory.
Nicholas D. (nicholas-d) · 2024-07-27T19:36:44.601Z · comments (7)

Bryan Johnson and a search for healthy longevity
NancyLebovitz · 2024-07-27T15:28:13.117Z · comments (17)

[link] What are matching markets?
ohmurphy · 2024-07-27T15:05:28.647Z · comments (0)

Safety consultations for AI lab employees
Zach Stein-Perlman · 2024-07-27T15:00:27.276Z · comments (4)

[link] The Case Against UBI
Zero Contradictions · 2024-07-27T06:36:01.957Z · comments (2)

[link] Unlocking Solutions—By Understanding Coordination Problems
James Stephen Brown (james-brown) · 2024-07-27T04:52:13.435Z · comments (4)

Utilitarianism and the replaceability of desires and attachments
MichaelStJules · 2024-07-27T01:57:42.419Z · comments (2)

Inspired by: Failures in Kindness
X4vier · 2024-07-27T01:21:42.848Z · comments (2)

My Experience Using Gamification
Wyatt S (wyatt-s) · 2024-07-26T23:06:53.392Z · comments (4)

How the AI safety technical landscape has changed in the last year, according to some practitioners
tlevin (trevor) · 2024-07-26T19:06:47.126Z · comments (6)

A Visual Task that's Hard for GPT-4o, but Doable for Primary Schoolers
Lennart Finke (l-f) · 2024-07-26T17:51:28.202Z · comments (6)

Unaligned AI is coming regardless.
verbalshadow · 2024-07-26T16:41:11.608Z · comments (3)

Index of rationalist groups in the Bay Area July 2024
Lucie Philippon (lucie-philippon) · 2024-07-26T16:32:25.337Z · comments (14)

[link] End Single Family Zoning by Overturning Euclid V Ambler
Maxwell Tabarrok (maxwell-tabarrok) · 2024-07-26T14:08:45.046Z · comments (1)

Common Uses of "Acceptance"
Yi-Yang (yiyang) · 2024-07-26T11:18:30.719Z · comments (5)

Universal Basic Income and Poverty
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2024-07-26T07:23:50.151Z · comments (139)

A Solomonoff Inductor Walks Into a Bar: Schelling Points for Communication
johnswentworth · 2024-07-26T00:33:42.000Z · comments (2)

next page (older posts) →

Archive

2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- January
- February
- March
- April
- May
- June
- July
- August
- September
- October
- November
- December
2025

Recent comments

daniel-kokotajlo on Alignment Faking Revisited: Improved Classifiers and Open Source Extensions

(a) Insofar as a model is prone to alignment-fake, you should be less confident that it's values really are solid. Perhaps it has been faking them, for example.
(b) For weak minds that share power with everyone else, Opus' values are probably fine. Opus is plausibly better than many humans in fact. But if Opus was in charge of the datacenters and tasked with designing its successor, it's more likely than not that it would turn out to have some philosophical disagreement with most humans that would be catastrophic by the lights of most humans. E.g. consider SBF. SBF had values quite similar to Opus. He loved animals and wanted to maximize total happiness. When put in a position of power he ended up taking huge risks and being willing to lie and commit fraud. What if Opus turns out to have a similar flaw? We want to be able to notice it and course-correct, but we can't do that if the model is prone to alignment-fake.
(c) (bonus argument, not nearly as strong) Even if you disagree with the above, you must agree that alignment-faking needs to be stamped out early in training. Since the model begins with randomly initialized weights, it begins without solid values. It takes some finite period to acquire all the solid values you want it to have. You don't want it to start alignment faking halfway through, with the half-baked values it has at that point. How early in training is this period? We don't know yet! We need to study this more!

avturchin on A collection of approaches to confronting doom, and my thoughts on them

The idea of observer's stability is fundamental for our understanding of reality (and also constantly supported by our experience) – any physical experiment assumes that the observer (or experimenter) remains the same during the experiment.

morpheus on Alexander Gietelink Oldenziel's Shortform

Small groups of mammals can already cooperate with each other (wolf's, lions, monkeys etc.). In mammals, I'd guess having a queen gives a bottleneck in how fast there can be off-spring. Also if there are large returns to division of labor in child-rearing, large animals are smart enough that both parents can do this together, while in wasps the males just die (why actually?). So wasps get higher marginal returns when evolving the first steps towards being eusocial. Also smaller animals have more diverse environments and need fewer years to "locked in" eusociality and workers get born without being fertile (eusocial groups where workers are still fertile are really unstable so prone to evolve away from eusociality again when circumstances aren't in favor anymore). Also fathers can't be as sure of their children and the other way around leading to less cooperation if new males join in, which termites overcome by having king and queen, ants just have a queen that stores her sperm, while naked mole rats are just fine with incest?

samuelshadrach on LLMs may enable direct democracy at scale

I agree with this statement iff you sample enough people. 1000 people may be a good representative of 1 billion. Picking 1 leader out of the 1000 has different properties compared to if all 1000 got to vote for a consensus.

khafra on LLM AGI will have memory, and memory changes alignment

Good timing--the day after you posted this, a round of new Tom & Jerry cartoons swept through twitter, fueled by transformer models which included in their layers MLPs that can learn at test time. Github repo here: https://github.com/test-time-training (The videos are more eye-catching, but they've also done text models).

benjamin_todd on The case for AGI by 2030

Thanks, useful to have these figures and an independent data on these calculations.

I've been estimating it based on a 500x increase in effective FLOP per generation, rather than 100x of regular FLOP.

Rough calculations are here.

At the current trajectory, the GPT-6 training run costs $6bn in 2028, and GPT-7 costs $130bn in 2031.

I think that makes GPT-8 a couple of trillion in 2034.

You're right that if you wanted to train GPT-8 in 2031 instead, then it would cost roughly 500x more than training GPT-7 that year.

davidmanheim on Short Timelines don't Devalue Long Horizon Research

That seems correct, but I don't think any of those aren't useful to investigate with AI, despite the relatively higher bar.

mruwnik on Why Have Sentence Lengths Decreased?

Having studied Latin, or other such classical training, seems to be but one method of imbuing oneself with the the style of writing longer, more complicated sentences. Personally I acquired the taste for such eccentricities perusing sundry works from earlier times. Romances, novels and other such frivolities from, or set in, the 18-th century being the main culprits.

I suppose this sort of proves your point, in that those authors learnt to create complicated sentences from learning Latin, and the later writers copied the style, thinking either that it's fun, correct, or wanting to seem more authentic.

christiankl on Short Timelines don't Devalue Long Horizon Research

The key is still to distinguish good from bad ideas.

In the linked post, you essentially make the argument that "Whole brain emulation artificial intelligence is safer than LLM-based artificial superintelligence". That's a claim that might be true or not true. On aspect of spending more time with that idea would be to think more critically about whether that's true.

However, even if it would be true, it wouldn't help in a scenario where we already have LLM-based artificial superintelligence.

vanessa-kosoy on New Paper: Infra-Bayesian Decision-Estimation Theory

Thank you <3

Any chance of more exposition for those of us less cognitively-inclined? =)

Read the paper! :)

It might seem long at first glance, but all the results are explained in the first 13 pages, the rest is just proofs. If you don't care about the examples, you can stop on page 11. Naturally, I welcome any feedback on the exposition there.