LessWrong 2.0 Reader

View: New · Old · Top

next page (older posts) →

[link] An ML paper on data stealing provides a construction for "gradient hacking"
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2024-07-30T21:44:37.310Z · comments (1)
[link] Open Source Automated Interpretability for Sparse Autoencoder Features
kh4dien · 2024-07-30T21:11:36.866Z · comments (1)
[link] Caterpillars and Philosophy
Zero Contradictions · 2024-07-30T20:54:06.921Z · comments (0)
[link] François Chollet on the limitations of LLMs in reasoning
2PuNCheeZ · 2024-07-30T20:04:12.271Z · comments (1)
[link] Against AI As An Existential Risk
Noah Birnbaum (daniel-birnbaum) · 2024-07-30T19:10:41.156Z · comments (13)
[question] Is objective morality self-defeating?
dialectica (bithov@icloud.com) · 2024-07-30T18:23:06.432Z · answers+comments (3)
Limitations on the Interpretability of Learned Features from Sparse Dictionary Learning
Tom Angsten (tom-angsten) · 2024-07-30T16:36:06.518Z · comments (0)
Self-Other Overlap: A Neglected Approach to AI Alignment
Marc Carauleanu (Marc-Everin Carauleanu) · 2024-07-30T16:22:29.561Z · comments (43)
Investigating the Ability of LLMs to Recognize Their Own Writing
Christopher Ackerman (christopher-ackerman) · 2024-07-30T15:41:44.017Z · comments (0)
Can Generalized Adversarial Testing Enable More Rigorous LLM Safety Evals?
scasper · 2024-07-30T14:57:06.807Z · comments (0)
RTFB: California’s AB 3211
Zvi · 2024-07-30T13:10:03.853Z · comments (2)
If You Can Climb Up, You Can Climb Down
jefftk (jkaufman) · 2024-07-30T00:00:06.295Z · comments (9)
[link] What is Morality?
Zero Contradictions · 2024-07-29T19:19:57.119Z · comments (0)
Arch-anarchism and immortality
Peter lawless · 2024-07-29T18:10:59.270Z · comments (1)
[link] AI Safety Newsletter #39: Implications of a Trump Administration for AI Policy Plus, Safety Engineering
Corin Katzke (corin-katzke) · 2024-07-29T17:50:52.454Z · comments (1)
[link] New Blog Post Against AI Doom
Noah Birnbaum (daniel-birnbaum) · 2024-07-29T17:21:29.633Z · comments (5)
An Interpretability Illusion from Population Statistics in Causal Analysis
Daniel Tan (dtch1997) · 2024-07-29T14:50:19.497Z · comments (3)
[question] How tokenization influences prompting?
Boris Kashirin (boris-kashirin) · 2024-07-29T10:28:25.056Z · answers+comments (4)
Understanding Positional Features in Layer 0 SAEs
bilalchughtai (beelal) · 2024-07-29T09:36:40.701Z · comments (0)
Prediction Markets Explained
Benjamin_Sturisky · 2024-07-29T08:02:40.943Z · comments (0)
San Francisco ACX Meetup “First Saturday”
Nate Sternberg (nate-sternberg) · 2024-07-29T06:11:01.165Z · comments (2)
Relativity Theory for What the Future 'You' Is and Isn't
FlorianH (florian-habermacher) · 2024-07-29T02:01:17.736Z · comments (48)
Wittgenstein and Word2vec: Capturing Relational Meaning in Language and Thought
cleanwhiteroom · 2024-07-28T19:55:17.247Z · comments (2)
Making Beliefs Pay Rent
Screwtape · 2024-07-28T17:59:52.101Z · comments (2)
This is already your second chance
Malmesbury (Elmer of Malmesbury) · 2024-07-28T17:13:57.680Z · comments (13)
[question] Has Eliezer publicly and satisfactorily responded to attempted rebuttals of the analogy to evolution?
kaler · 2024-07-28T12:23:40.671Z · answers+comments (14)
[link] Family and Society
Zero Contradictions · 2024-07-28T07:05:55.899Z · comments (0)
[question] What is AI Safety’s line of retreat?
Remmelt (remmelt-ellen) · 2024-07-28T05:43:05.021Z · answers+comments (12)
AXRP Episode 34 - AI Evaluations with Beth Barnes
DanielFilan · 2024-07-28T03:30:07.192Z · comments (0)
Rats, Back a Candidate
Blake (blake-1) · 2024-07-28T03:19:14.217Z · comments (19)
[link] AI existential risk probabilities are too unreliable to inform policy
Oleg Trott (oleg-trott) · 2024-07-28T00:59:59.497Z · comments (5)
[link] Idle Speculations on Pipeline Parallelism
DaemonicSigil · 2024-07-27T22:40:12.543Z · comments (0)
[link] Re: Anthropic's suggested SB-1047 amendments
RobertM (T3t) · 2024-07-27T22:32:39.447Z · comments (13)
[link] The problem with psychology is that it has no theory.
Nicholas D. (nicholas-d) · 2024-07-27T19:36:44.601Z · comments (7)
Bryan Johnson and a search for healthy longevity
NancyLebovitz · 2024-07-27T15:28:13.117Z · comments (17)
[link] What are matching markets?
ohmurphy · 2024-07-27T15:05:28.647Z · comments (0)
Safety consultations for AI lab employees
Zach Stein-Perlman · 2024-07-27T15:00:27.276Z · comments (4)
[link] The Case Against UBI
Zero Contradictions · 2024-07-27T06:36:01.957Z · comments (2)
[link] Unlocking Solutions—By Understanding Coordination Problems
James Stephen Brown (james-brown) · 2024-07-27T04:52:13.435Z · comments (4)
Utilitarianism and the replaceability of desires and attachments
MichaelStJules · 2024-07-27T01:57:42.419Z · comments (2)
Inspired by: Failures in Kindness
X4vier · 2024-07-27T01:21:42.848Z · comments (2)
My Experience Using Gamification
Wyatt S (wyatt-s) · 2024-07-26T23:06:53.392Z · comments (4)
How the AI safety technical landscape has changed in the last year, according to some practitioners
tlevin (trevor) · 2024-07-26T19:06:47.126Z · comments (6)
A Visual Task that's Hard for GPT-4o, but Doable for Primary Schoolers
Lennart Finke (l-f) · 2024-07-26T17:51:28.202Z · comments (4)
Unaligned AI is coming regardless.
verbalshadow · 2024-07-26T16:41:11.608Z · comments (3)
Index of rationalist groups in the Bay Area July 2024
Lucie Philippon (lucie-philippon) · 2024-07-26T16:32:25.337Z · comments (10)
[link] End Single Family Zoning by Overturning Euclid V Ambler
Maxwell Tabarrok (maxwell-tabarrok) · 2024-07-26T14:08:45.046Z · comments (1)
Common Uses of "Acceptance"
Yi-Yang (yiyang) · 2024-07-26T11:18:30.719Z · comments (5)
Universal Basic Income and Poverty
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2024-07-26T07:23:50.151Z · comments (131)
A Solomonoff Inductor Walks Into a Bar: Schelling Points for Communication
johnswentworth · 2024-07-26T00:33:42.000Z · comments (1)
next page (older posts) →