LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

ARENA 2.0 - Impact Report
CallumMcDougall (TheMcDouglas) · 2023-09-26T17:13:19.952Z · comments (5)
Mechanistic Interpretability Reading group
1stuserhere (firstuser-here) · 2023-09-26T16:26:44.757Z · comments (0)
Announcing the CNN Interpretability Competition
scasper · 2023-09-26T16:21:50.276Z · comments (0)
Making AIs less likely to be spiteful
Nicolas Macé (NicolasMace) · 2023-09-26T14:12:06.202Z · comments (2)
[link] [Linkpost] Mark Zuckerberg confronted about Meta's Llama 2 AI's ability to give users detailed guidance on making anthrax - Business Insider
mic (michael-chen) · 2023-09-26T12:05:57.396Z · comments (11)
Enforcing Far-Future Contracts for Governments
FCCC · 2023-09-26T04:26:46.442Z · comments (49)
Carioca Petrov Day
Giskard (tiago-macedo) · 2023-09-26T00:30:36.906Z · comments (0)
[question] A few Alignment questions: utility optimizers, SLT, sharp left turn and identifiability
Igor Timofeev (igor-timofeev-1) · 2023-09-26T00:27:23.229Z · answers+comments (1)
Impact stories for model internals: an exercise for interpretability researchers
jenny · 2023-09-25T23:15:29.189Z · comments (3)
[link] Autonomic Sanity
Sable · 2023-09-25T22:37:07.262Z · comments (9)
[question] What is wrong with this "utility switch button problem" approach?
Donald Hobson (donald-hobson) · 2023-09-25T21:36:47.166Z · answers+comments (3)
You should just smile at strangers a lot
chaosmage · 2023-09-25T20:12:56.907Z · comments (10)
[link] The King and the Golem
Richard_Ngo (ricraz) · 2023-09-25T19:51:22.980Z · comments (15)
[link] Public Opinion on AI Safety: AIMS 2023 and 2021 Summary
Jacy Reese Anthis (Jacy Reese) · 2023-09-25T18:55:41.532Z · comments (2)
Welcome to Apply: The 2024 Vitalik Buterin Fellowships in AI Existential Safety by FLI!
Zhijing Jin · 2023-09-25T18:42:13.320Z · comments (2)
Evaluating hidden directions on the utility dataset: classification, steering and removal
Annah (annah) · 2023-09-25T17:19:13.988Z · comments (3)
Linkpost: A model of biases as arising from meta-beliefs
JuanGarcia · 2023-09-25T17:14:55.538Z · comments (0)
[question] What causes a decision theory to be used?
Dagon · 2023-09-25T16:33:36.161Z · answers+comments (2)
[link] Understanding strategic deception and deceptive alignment
Marius Hobbhahn (marius-hobbhahn) · 2023-09-25T16:27:47.357Z · comments (16)
[link] The Merits of Contrarianism & Why I hate Chatbots. [My Experience with the Ideological Turing Test @ a Less Wrong meetup]
Amina V. (aminah-vinson) · 2023-09-25T16:13:04.113Z · comments (1)
Inside Views, Impostor Syndrome, and the Great LARP
johnswentworth · 2023-09-25T16:08:17.040Z · comments (53)
“X distracts from Y” as a thinly-disguised fight over group status / politics
Steven Byrnes (steve2152) · 2023-09-25T15:18:18.644Z · comments (14)
[link] Amazon to invest up to $4 billion in Anthropic
Davis_Kingsley · 2023-09-25T14:55:35.983Z · comments (8)
Should Effective Altruists be Valuists instead of utilitarians?
spencerg · 2023-09-25T14:03:10.958Z · comments (3)
Feedly Breaks MathML
jefftk (jkaufman) · 2023-09-25T13:40:05.759Z · comments (3)
[question] How have you become more hard-working?
Chi Nguyen · 2023-09-25T12:37:39.860Z · answers+comments (40)
Automating Intelligence: A Cursory Glance at How AutoML Brings Precision to AI Development
[deleted] · 2023-09-25T09:39:31.338Z · comments (0)
[link] Categorization Hell
UtilityMonster (Matt Goldwater) · 2023-09-24T18:18:03.136Z · comments (0)
Interpreting OpenAI's Whisper
EllenaR · 2023-09-24T17:53:44.955Z · comments (10)
Contradiction Appeal Bias
onur · 2023-09-24T17:03:58.724Z · comments (2)
RAIN: Your Language Models Can Align Themselves without Finetuning - Microsoft Research 2023 - Reduces the adversarial prompt attack success rate from 94% to 19%!
Singularian2501 (maik-zywitza) · 2023-09-24T16:48:18.360Z · comments (0)
Honor System for Vaccination?
jefftk (jkaufman) · 2023-09-24T11:50:05.809Z · comments (22)
Far-Future Commitments as a Policy Consensus Strategy
FCCC · 2023-09-24T06:34:55.505Z · comments (40)
Five neglected work areas that could reduce AI risk
[deleted] · 2023-09-24T02:03:29.829Z · comments (5)
[question] Are the other Rationality: A-Z sequences coming out as books?
caffeinated_dissonance (alex-goldstein) · 2023-09-24T00:38:51.939Z · answers+comments (2)
The Dick Kick'em Paradox
Augs SMSHacks (augs-smshacks) · 2023-09-23T22:22:06.827Z · comments (21)
I designed an AI safety course (for a philosophy department)
Eleni Angelou (ea-1) · 2023-09-23T22:03:00.036Z · comments (15)
[link] Paper: LLMs trained on “A is B” fail to learn “B is A”
lberglund (brglnd) · 2023-09-23T19:55:53.427Z · comments (73)
Sparse Coding, for Mechanistic Interpretability and Activation Engineering
David Udell · 2023-09-23T19:16:31.772Z · comments (7)
[question] Places to meet interesting middle-aged men?
anon_girl · 2023-09-23T19:06:48.829Z · answers+comments (7)
Taking features out of superposition with sparse autoencoders more quickly with informed initialization
Pierre Peigné (pierre-peigne) · 2023-09-23T16:21:42.799Z · comments (8)
A quick remark on so-called “hallucinations” in LLMs and humans
Bill Benzon (bill-benzon) · 2023-09-23T12:17:26.600Z · comments (4)
Hand-writing MathML
jefftk (jkaufman) · 2023-09-23T11:20:07.870Z · comments (40)
Musk, Starlink, and Crimea
NicholasKross · 2023-09-23T02:35:02.623Z · comments (0)
[link] [Linkpost/Video] All The Times We Nearly Blew Up The World
Jacob G-W (g-w1) · 2023-09-23T01:18:03.008Z · comments (1)
Luck based medicine: inositol for anxiety and brain fog
Elizabeth (pktechgirl) · 2023-09-22T20:10:07.117Z · comments (5)
If influence functions are not approximating leave-one-out, how are they supposed to help?
Fabien Roger (Fabien) · 2023-09-22T14:23:45.847Z · comments (4)
Modeling p(doom) with TrojanGDP
K. Liam Smith (Liam Smith) · 2023-09-22T14:19:31.437Z · comments (2)
Let's talk about Impostor syndrome in AI safety
Igor Ivanov (igor-ivanov) · 2023-09-22T13:51:18.482Z · comments (4)
Fund Transit With Development
jefftk (jkaufman) · 2023-09-22T11:10:05.645Z · comments (22)
← previous page (newer posts) · next page (older posts) →