LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

"What the hell is a representation, anyway?" | Clarifying AI interpretability with tools from philosophy of cognitive science | Part 1: Vehicles vs. contents
IwanWilliams · 2024-06-09T14:19:28.322Z · comments (0)
[link] Exploring Llama-3-8B MLP Neurons
ntt123 (thong-nguyen) · 2024-06-09T14:19:10.822Z · comments (0)
Demystifying "Alignment" through a Comic
milanrosko · 2024-06-09T08:24:22.454Z · comments (18)
Dumbing down
Martin Sustrik (sustrik) · 2024-06-09T06:50:47.469Z · comments (0)
What if a tech company forced you to move to NYC?
KatjaGrace · 2024-06-09T06:30:03.329Z · comments (20)
[question] What should I do? (long term plan about starting an AI lab)
not_a_cat · 2024-06-09T00:45:12.369Z · answers+comments (1)
[link] Searching for the Root of the Tree of Evil
Ivan Vendrov (ivan-vendrov) · 2024-06-08T17:05:53.950Z · comments (14)
2. Corrigibility Intuition
Max Harms (max-harms) · 2024-06-08T15:52:29.971Z · comments (2)
Two easy things that maybe Just Work to improve AI discourse
jacobjacob · 2024-06-08T15:51:18.078Z · comments (34)
I made an AI safety fellowship. What I wish I knew.
Ruben Castaing (ruben-castaing) · 2024-06-08T15:23:26.618Z · comments (0)
Alignment Gaps
kcyras · 2024-06-08T15:23:16.396Z · comments (3)
The Slack Double Crux, or how to negotiate with yourself
Thac0 · 2024-06-08T15:22:58.533Z · comments (2)
The Perils of Popularity: A Critical Examination of LessWrong's Rational Discourse
BubbaJoeLouis · 2024-06-08T15:22:08.458Z · comments (3)
[link] Status quo bias is usually justified
Amadeus Pagel (amadeus-pagel) · 2024-06-08T14:54:13.648Z · comments (3)
Closed-Source Evaluations
Jono (lw-user0246) · 2024-06-08T14:18:40.800Z · comments (3)
Access to powerful AI might make computer security radically easier
Buck · 2024-06-08T06:00:19.310Z · comments (14)
[question] Why don't we just get rid of all the bioethicists?
Sable · 2024-06-08T03:48:47.760Z · answers+comments (0)
Sev, Sevteen, Sevty, Sevth
jefftk (jkaufman) · 2024-06-08T02:30:02.040Z · comments (9)
1. The CAST Strategy
Max Harms (max-harms) · 2024-06-07T22:29:13.005Z · comments (10)
0. CAST: Corrigibility as Singular Target
Max Harms (max-harms) · 2024-06-07T22:29:12.934Z · comments (9)
What is space? What is time?
Tahp · 2024-06-07T22:15:55.951Z · comments (3)
[question] Question about Lewis' counterfactual theory of causation
jbkjr · 2024-06-07T20:15:27.561Z · answers+comments (7)
Relationships among words, metalingual definition, and interpretability
Bill Benzon (bill-benzon) · 2024-06-07T19:18:18.389Z · comments (0)
[link] Let’s Talk About Emergence
jacobhaimes · 2024-06-07T19:18:16.382Z · comments (0)
D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues
aphyer · 2024-06-07T19:02:06.859Z · comments (14)
Natural Latents Are Not Robust To Tiny Mixtures
johnswentworth · 2024-06-07T18:53:36.643Z · comments (8)
Situational Awareness Summarized - Part 2
Joe Rogero · 2024-06-07T17:20:03.513Z · comments (0)
Frida van Lisa, a short story about adversarial AI attacks on humans
arisAlexis (arisalexis) · 2024-06-07T13:22:38.118Z · comments (0)
Quotes from Leopold Aschenbrenner’s Situational Awareness Paper
Zvi · 2024-06-07T11:40:03.981Z · comments (8)
LessWrong/ACX meetup Transilvanya tour - Cluj Napoca
Marius Adrian Nicoară · 2024-06-07T05:45:14.578Z · comments (0)
[link] Is Claude a mystic?
jessicata (jessica.liu.taylor) · 2024-06-07T04:27:09.118Z · comments (15)
Offering Completion
jefftk (jkaufman) · 2024-06-07T01:40:02.137Z · comments (6)
A Case for Superhuman Governance, using AI
ozziegooen · 2024-06-07T00:10:10.902Z · comments (0)
Memorizing weak examples can elicit strong behavior out of password-locked models
Fabien Roger (Fabien) · 2024-06-06T23:54:25.167Z · comments (5)
Response to Aschenbrenner's "Situational Awareness"
Rob Bensinger (RobbBB) · 2024-06-06T22:57:11.737Z · comments (26)
Scaling and evaluating sparse autoencoders
leogao · 2024-06-06T22:50:39.440Z · comments (6)
Humming is not a free $100 bill
Elizabeth (pktechgirl) · 2024-06-06T20:10:02.457Z · comments (5)
[link] There Are No Primordial Definitions of Man/Woman
ymeskhout · 2024-06-06T19:30:43.930Z · comments (0)
Situational Awareness Summarized - Part 1
Joe Rogero · 2024-06-06T18:59:59.409Z · comments (0)
[link] [Link Post] "Foundational Challenges in Assuring Alignment and Safety of Large Language Models"
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2024-06-06T18:55:09.151Z · comments (2)
AI #67: Brief Strange Trip
Zvi · 2024-06-06T18:50:03.514Z · comments (6)
The Human Biological Advantage Over AI
Wstewart · 2024-06-06T18:18:03.301Z · comments (2)
An evaluation of Helen Toner’s interview on the TED AI Show
PeterH · 2024-06-06T17:39:40.800Z · comments (2)
The Impossibility of a Rational Intelligence Optimizer
Nicolas Villarreal (nicolas-villarreal) · 2024-06-06T16:14:03.481Z · comments (5)
Immunization against harmful fine-tuning attacks
domenicrosati · 2024-06-06T15:17:42.495Z · comments (0)
SB 1047 Is Weakened
Zvi · 2024-06-06T13:40:41.547Z · comments (4)
Weeping Agents
pleiotroth · 2024-06-06T12:18:54.978Z · comments (2)
Podcast: Center for AI Policy, on AI risk and listening to AI researchers
KatjaGrace · 2024-06-06T03:30:03.950Z · comments (0)
Calculating Natural Latents via Resampling
johnswentworth · 2024-06-06T00:37:42.127Z · comments (4)
SAEs Discover Meaningful Features in the IOI Task
Alex Makelov (amakelov) · 2024-06-05T23:48:04.808Z · comments (0)
← previous page (newer posts) · next page (older posts) →