LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

Exploring the Residual Stream of Transformers for Mechanistic Interpretability — Explained
Zeping Yu · 2023-12-26T00:36:50.326Z · comments (1)
[question] Anki setup best practices?
Sinclair Chen (sinclair-chen) · 2023-12-25T22:34:34.639Z · answers+comments (4)
[question] Why does expected utility matter?
Marco Discendenti (marco-discendenti) · 2023-12-25T14:47:46.656Z · answers+comments (21)
Freeze Dried Raspberry Truffles
jefftk (jkaufman) · 2023-12-25T14:10:06.336Z · comments (0)
Pornographic and semi-pornographic ads on mainstream websites as an instance of the AI alignment problem?
greenrd · 2023-12-25T13:19:57.026Z · comments (5)
Defense Against The Dark Arts: An Introduction
Lyrongolem (david-xiao) · 2023-12-25T06:36:06.278Z · comments (36)
[link] Occlusions of Moral Knowledge
herschel (hrs) · 2023-12-25T05:55:16.529Z · comments (0)
[question] Would you have a baby in 2024?
martinkunev · 2023-12-25T01:52:04.358Z · answers+comments (53)
[link] align your latent spaces
bhauth · 2023-12-24T16:30:09.138Z · comments (8)
Viral Guessing Game
jefftk (jkaufman) · 2023-12-24T13:10:11.917Z · comments (0)
The Sugar Alignment Problem
Adam Zerner (adamzerner) · 2023-12-24T01:35:20.226Z · comments (3)
A Crisper Explanation of Simulacrum Levels
Thane Ruthenis · 2023-12-23T22:13:52.286Z · comments (13)
Hyperbolic Discounting and Pascal’s Mugging
Andrew Keenan Richardson (qemqemqem) · 2023-12-23T21:55:27.091Z · comments (0)
[link] AISN #28: Center for AI Safety 2023 Year in Review
aogara (Aidan O'Gara) · 2023-12-23T21:31:40.767Z · comments (1)
"Inftoxicity" and other new words to describe malicious information and communication thereof
Jáchym Fibír · 2023-12-23T18:15:50.369Z · comments (6)
AI's impact on biology research: Part I, today
octopocta · 2023-12-23T16:29:18.056Z · comments (6)
[link] AI Girlfriends Won't Matter Much
Maxwell Tabarrok (maxwell-tabarrok) · 2023-12-23T15:58:30.308Z · comments (22)
The Next Right Token
jefftk (jkaufman) · 2023-12-23T03:20:07.131Z · comments (0)
Fact Finding: Do Early Layers Specialise in Local Processing? (Post 5)
Neel Nanda (neel-nanda-1) · 2023-12-23T02:46:25.892Z · comments (0)
Fact Finding: How to Think About Interpreting Memorisation (Post 4)
Senthooran Rajamanoharan (SenR) · 2023-12-23T02:46:16.675Z · comments (0)
Fact Finding: Trying to Mechanistically Understanding Early MLPs (Post 3)
Neel Nanda (neel-nanda-1) · 2023-12-23T02:46:05.517Z · comments (0)
Fact Finding: Simplifying the Circuit (Post 2)
Senthooran Rajamanoharan (SenR) · 2023-12-23T02:45:49.675Z · comments (3)
Fact Finding: Attempting to Reverse-Engineer Factual Recall on the Neuron Level (Post 1)
Neel Nanda (neel-nanda-1) · 2023-12-23T02:44:24.270Z · comments (4)
Measurement tampering detection as a special case of weak-to-strong generalization
ryan_greenblatt · 2023-12-23T00:05:55.357Z · comments (10)
[link] How does a toy 2 digit subtraction transformer predict the difference?
Evan Anders (evan-anders) · 2023-12-22T21:17:30.331Z · comments (0)
Thoughts on Max Tegmark's AI verification
Johannes C. Mayer (johannes-c-mayer) · 2023-12-22T20:38:31.566Z · comments (0)
Idealized Agents Are Approximate Causal Mirrors (+ Radical Optimism on Agent Foundations)
Thane Ruthenis · 2023-12-22T20:19:13.865Z · comments (13)
AI safety advocates should consider providing gentle pushback following the events at OpenAI
civilsociety · 2023-12-22T18:55:12.920Z · comments (5)
"Destroy humanity" as an immediate subgoal
Seth Ahrenbach (seth-ahrenbach) · 2023-12-22T18:52:40.427Z · comments (13)
Synthetic Restrictions
nano_brasca (ignacio-brasca) · 2023-12-22T18:50:07.511Z · comments (0)
Review Report of Davidson on Takeoff Speeds (2023)
Trent Kannegieter · 2023-12-22T18:48:55.983Z · comments (11)
Open positions: Research Analyst at the AI Standards Lab
Koen.Holtman · 2023-12-22T16:31:45.215Z · comments (0)
[link] The problems with the concept of an infohazard as used by the LW community [Linkpost]
Noosphere89 (sharmake-farah) · 2023-12-22T16:13:54.822Z · comments (43)
Employee Incentives Make AGI Lab Pauses More Costly
nikola (nikolaisalreadytaken) · 2023-12-22T05:04:15.598Z · comments (12)
The LessWrong 2022 Review: Review Phase
RobertM (T3t) · 2023-12-22T03:23:49.635Z · comments (7)
[link] The absence of self-rejection is self-acceptance
Chipmonk · 2023-12-21T21:54:52.116Z · comments (1)
A Decision Theory Can Be Rational or Computable, but Not Both
StrivingForLegibility · 2023-12-21T21:02:45.366Z · comments (4)
Most People Don't Realize We Have No Idea How Our AIs Work
Thane Ruthenis · 2023-12-21T20:02:00.360Z · comments (42)
Pseudonymity and Accusations
jefftk (jkaufman) · 2023-12-21T19:20:19.944Z · comments (20)
[link] Attention on AI X-Risk Likely Hasn't Distracted from Current Harms from AI
Erich_Grunewald · 2023-12-21T17:24:16.713Z · comments (2)
[link] "Alignment" is one of six words of the year in the Harvard Gazette
nikola (nikolaisalreadytaken) · 2023-12-21T15:54:04.682Z · comments (1)
AI #43: Functional Discoveries
Zvi · 2023-12-21T15:50:04.442Z · comments (26)
[link] Rating my AI Predictions
Robert_AIZI · 2023-12-21T14:07:50.052Z · comments (5)
AI Safety Chatbot
markov (markovial) · 2023-12-21T14:06:48.981Z · comments (11)
On OpenAI’s Preparedness Framework
Zvi · 2023-12-21T14:00:05.144Z · comments (4)
Prediction Markets aren't Magic
SimonM · 2023-12-21T12:54:07.754Z · comments (29)
[question] Why is capnometry biofeedback not more widely known?
riceissa · 2023-12-21T02:42:05.665Z · answers+comments (22)
My best guess at the important tricks for training 1L SAEs
Arthur Conmy (arthur-conmy) · 2023-12-21T01:59:06.208Z · comments (4)
Seattle Winter Solstice
a7x · 2023-12-20T20:30:35.299Z · comments (1)
How Would an Utopia-Maximizer Look Like?
Thane Ruthenis · 2023-12-20T20:01:18.079Z · comments (23)
← previous page (newer posts) · next page (older posts) →