LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Back to Basics: Truth is Unitary
lsusr · 2024-03-29T21:10:33.399Z · comments (13)

Laying the Foundations for Vision and Multimodal Mechanistic Interpretability & Open Problems
Sonia Joseph (redhat) · 2024-03-13T17:09:17.027Z · comments (13)

[link] Non-alignment project ideas for making transformative AI go well
Lukas Finnveden (Lanrian) · 2024-01-04T07:23:13.658Z · comments (1)

Start an Upper-Room UV Installation Company?
jefftk (jkaufman) · 2024-10-19T02:00:10.691Z · comments (9)

Userscript to always show LW comments in context vs at the top
Vlad Sitalo (harcisis) · 2023-11-21T17:53:30.418Z · comments (8)

Simplifying Corrigibility – Subagent Corrigibility Is Not Anti-Natural
Rubi J. Hudson (Rubi) · 2024-07-16T22:44:17.128Z · comments (27)

Case Study: Interpreting, Manipulating, and Controlling CLIP With Sparse Autoencoders
Gytis Daujotas (gytis-daujotas) · 2024-08-01T21:08:38.800Z · comments (6)

[link] Open Sourcing Metaculus
ChristianWilliams · 2024-07-02T22:30:01.339Z · comments (0)

D&D.Sci Long War: Defender of Data-mocracy
aphyer · 2024-04-26T22:30:15.780Z · comments (20)

When Does Altruism Strengthen Altruism?
jefftk (jkaufman) · 2024-01-21T18:50:05.424Z · comments (2)

[link] Amazon to invest up to $4 billion in Anthropic
Davis_Kingsley · 2023-09-25T14:55:35.983Z · comments (8)

An Introduction to AI Sandbagging
Teun van der Weij (teun-van-der-weij) · 2024-04-26T13:40:00.126Z · comments (10)

New intro textbook on AIXI
Alex_Altair · 2024-05-11T18:18:50.945Z · comments (8)

AI #60: Oh the Humanity
Zvi · 2024-04-18T14:10:02.281Z · comments (7)

Commonsense Good, Creative Good
jefftk (jkaufman) · 2023-09-27T19:50:07.486Z · comments (11)

What does davidad want from «boundaries»?
Chipmonk · 2024-02-06T17:45:42.348Z · comments (1)

[link] LLM Evaluators Recognize and Favor Their Own Generations
Arjun Panickssery (arjun-panickssery) · 2024-04-17T21:09:12.007Z · comments (1)

On Trust
johnswentworth · 2023-12-06T19:19:07.680Z · comments (26)

Truthseeking, EA, Simulacra levels, and other stuff
Elizabeth (pktechgirl) · 2023-10-27T23:56:49.198Z · comments (12)

AI #38: Let’s Make a Deal
Zvi · 2023-11-16T19:50:05.442Z · comments (2)

[link] Fluent dreaming for language models (AI interpretability method)
tbenthompson (ben-thompson) · 2024-02-06T06:02:59.296Z · comments (4)

[link] EPUBs of MIRI Blog Archives and selected LW Sequences
mesaoptimizer · 2023-10-26T14:17:11.538Z · comments (6)

[link] Chinese scientists acknowledge xrisk & call for international regulatory body [Linkpost]
Akash (akash-wasil) · 2023-11-01T13:28:43.723Z · comments (4)

Announcing Atlas Computing
miyazono · 2024-04-11T15:56:31.241Z · comments (4)

Apply to LASR Labs: a London-based technical AI safety research programme
Erin Robertson · 2024-04-09T17:34:06.847Z · comments (1)

On the Contrary, Steelmanning Is Normal; ITT-Passing Is Niche
Zack_M_Davis · 2024-01-09T23:12:20.349Z · comments (31)

Auditing failures vs concentrated failures
ryan_greenblatt · 2023-12-11T02:47:35.703Z · comments (0)

ProLU: A Nonlinearity for Sparse Autoencoders
Glen Taggart · 2024-04-23T14:09:21.592Z · comments (4)

[question] Where is the Town Square?
Gretta Duleba (gretta-duleba) · 2024-02-13T03:53:18.205Z · answers+comments (8)

[link] An EPUB of Arbital's AI Alignment section
mesaoptimizer · 2023-10-16T19:36:29.109Z · comments (1)

[link] How bad is chlorinated water?
bhauth · 2023-12-13T18:00:12.640Z · comments (18)

[question] Does reducing the amount of RL for a given capability level make AI safer?
Chris_Leong · 2024-05-05T17:04:01.799Z · answers+comments (22)

AXRP Episode 25 - Cooperative AI with Caspar Oesterheld
DanielFilan · 2023-10-03T21:50:07.552Z · comments (0)

Childhood and Education Roundup #4
Zvi · 2024-01-30T13:50:06.033Z · comments (10)

The Case for Predictive Models
Rubi J. Hudson (Rubi) · 2024-04-03T18:22:20.243Z · comments (7)

2023 LessWrong Community Census, Request for Comments
Screwtape · 2023-11-01T16:32:19.102Z · comments (37)

My intellectual journey to (dis)solve the hard problem of consciousness
Charbel-Raphaël (charbel-raphael-segerie) · 2024-04-06T09:32:41.612Z · comments (41)

Incidental polysemanticity
Victor Lecomte (victor-lecomte) · 2023-11-15T04:00:00.000Z · comments (7)

[link] Project ideas: Epistemics
Lukas Finnveden (Lanrian) · 2024-01-05T23:41:23.721Z · comments (4)

Concrete empirical research projects in mechanistic anomaly detection
Erik Jenner (ejenner) · 2024-04-03T23:07:21.502Z · comments (3)

[link] cancer rates after gene therapy
bhauth · 2024-10-16T15:32:53.949Z · comments (0)

How difficult is AI Alignment?
Sammy Martin (SDM) · 2024-09-13T15:47:10.799Z · comments (6)

Sci-Fi books micro-reviews
Yair Halberstadt (yair-halberstadt) · 2024-06-24T09:49:28.523Z · comments (27)

Ambiguity in Prediction Market Resolution is Still Harmful
aphyer · 2024-07-31T20:32:40.217Z · comments (17)

New Executive Team & Board — PIBBSS
Nora_Ammann · 2024-07-01T19:30:45.261Z · comments (1)

Which LessWrong/Alignment topics would you like to be tutored in? [Poll]
Ruby · 2024-09-19T01:35:02.999Z · comments (12)

Minimal Motivation of Natural Latents
johnswentworth · 2024-10-14T22:51:58.125Z · comments (14)

Motivation control
Joe Carlsmith (joekc) · 2024-10-30T17:15:50.881Z · comments (7)

[link] Why Georgism Lost Its Popularity
Zero Contradictions · 2024-07-20T15:08:41.469Z · comments (50)

Understanding Positional Features in Layer 0 SAEs
bilalchughtai (beelal) · 2024-07-29T09:36:40.701Z · comments (0)

← previous page (newer posts) · next page (older posts) →

^{^}

"Curated", a term which here means "This just got emailed to 30,000 people, of whom typically half open the email, and it gets shown at the top of the frontpage to anyone who hasn't read it for ~1 week."

LessWrong 2.0 Reader

Archive

Recent comments