LessWrong 2.0 Reader

View: New · Old · Top

next page (older posts) →

Reviewing LessWrong: Screwtape's Basic Answer
Screwtape · 2025-02-05T04:30:34.347Z · comments (0)
[question] Why isn't AI containment the primary AI safety strategy?
OKlogic · 2025-02-05T03:54:58.171Z · answers+comments (0)
[link] We Fell For It
Nicholas / Heather Kross (NicholasKross) · 2025-02-05T03:07:43.175Z · comments (0)
Nick Land: Orthogonality
lumpenspace (lumpen-space) · 2025-02-04T21:07:04.947Z · comments (3)
What working on AI safety taught me about B2B SaaS sales
purple fire (jack-edwards) · 2025-02-04T20:50:19.990Z · comments (3)
Subjective Naturalism in Decision Theory: Savage vs. Jeffrey–Bolker
Daniel Herrmann (Whispermute) · 2025-02-04T20:34:22.625Z · comments (2)
Anti-Slop Interventions?
abramdemski · 2025-02-04T19:50:29.127Z · comments (12)
Can Persuasion Break AI Safety? Exploring the Interplay Between Fine-Tuning, Attacks, and Guardrails
Devina Jain (devina-jain) · 2025-02-04T19:10:13.933Z · comments (0)
[question] Journalism student looking for sources
pinkerton · 2025-02-04T18:58:49.740Z · answers+comments (0)
We’re in Deep Research
Zvi · 2025-02-04T17:20:06.540Z · comments (2)
[link] The Capitalist Agent
henophilia · 2025-02-04T15:32:39.694Z · comments (5)
[link] Forecasting AGI: Insights from Prediction Markets and Metaculus
Alvin Ånestrand (alvin-anestrand) · 2025-02-04T13:03:45.927Z · comments (0)
Ruling Out Lookup Tables
Alfred Harwood · 2025-02-04T10:39:34.899Z · comments (5)
Half-baked idea: a straightforward method for learning environmental goals?
Q Home · 2025-02-04T06:56:31.813Z · comments (0)
Information Versus Action
Screwtape · 2025-02-04T05:13:55.192Z · comments (0)
Utilitarian AI Alignment: Building a Moral Assistant with the Constitutional AI Method
Clément L · 2025-02-04T04:15:36.917Z · comments (0)
Tear Down the Burren
jefftk (jkaufman) · 2025-02-04T03:40:02.767Z · comments (2)
[link] Constitutional Classifiers: Defending against universal jailbreaks (Anthropic Blog)
Archimedes · 2025-02-04T02:55:44.401Z · comments (0)
Can someone, anyone, make superintelligence a more concrete concept?
Ori Nagel (ori-nagel) · 2025-02-04T02:18:51.718Z · comments (3)
[link] What are the "no free lunch" theorems?
Vishakha (vishakha-agrawal) · 2025-02-04T02:02:18.423Z · comments (1)
eliminating bias through language?
KvmanThinking (avery-liu) · 2025-02-04T01:52:01.508Z · comments (1)
New Foresight Longevity Bio & Molecular Nano Grants Program
Allison Duettmann (allison-duettmann) · 2025-02-04T00:28:30.147Z · comments (0)
[link] Meta: Frontier AI Framework
Zach Stein-Perlman · 2025-02-03T22:00:17.103Z · comments (2)
$300 Fermi Model Competition
ozziegooen · 2025-02-03T19:47:09.270Z · comments (4)
Visualizing Interpretability
Darold Davis (darold) · 2025-02-03T19:36:38.938Z · comments (0)
Alignment Can Reduce Performance on Simple Ethical Questions
Daan Henselmans (drhens) · 2025-02-03T19:35:42.895Z · comments (7)
The Overlap Paradigm: Rethinking Data's Role in Weak-to-Strong Generalization (W2SG)
Serhii Zamrii (aligning_bias) · 2025-02-03T19:31:55.282Z · comments (0)
Sleeper agents appear resilient to activation steering
Lucy Wingard (lucy-wingard) · 2025-02-03T19:31:30.702Z · comments (0)
Part 1: Enhancing Inner Alignment in CLIP Vision Transformers: Mitigating Reification Bias with SAEs and Grad ECLIP
Gilber A. Corrales (mysticdeepai) · 2025-02-03T19:30:52.505Z · comments (0)
A "base process" conceptually "below" any "base" universes
Amy Johnson (Amy Minge) · 2025-02-03T19:11:22.706Z · comments (2)
How AGI Defines Its Self
Davey Morse (davey-morse) · 2025-02-03T18:47:22.287Z · comments (4)
Gettier Cases [repost]
Antigone (luke-st-clair) · 2025-02-03T18:12:22.253Z · comments (4)
The Self-Reference Trap in Mathematics
Alister Munday (alister-munday) · 2025-02-03T16:12:21.392Z · comments (22)
Stopping unaligned LLMs is easy!
Yair Halberstadt (yair-halberstadt) · 2025-02-03T15:38:27.083Z · comments (11)
The Outer Levels
Jerdle (daniel-amdurer) · 2025-02-03T14:30:29.230Z · comments (1)
o3-mini Early Days
Zvi · 2025-02-03T14:20:06.443Z · comments (0)
[link] OpenAI releases deep research agent
Seth Herd · 2025-02-03T12:48:44.925Z · comments (20)
Neuron Activations to CLIP Embeddings: Geometry of Linear Combinations in Latent Space
Roman Malov · 2025-02-03T10:30:48.866Z · comments (0)
[question] Can we infer the search space of a local optimiser?
Lucius Bushnaq (Lblack) · 2025-02-03T10:17:01.661Z · answers+comments (5)
Pick two: concise, comprehensive, or clear rules
Screwtape · 2025-02-03T06:39:05.815Z · comments (27)
[link] Language Models and World Models, a Philosophy
kyjohnso · 2025-02-03T02:55:36.577Z · comments (0)
[link] Keeping Capital is the Challenge
LTM · 2025-02-03T02:04:27.142Z · comments (2)
Use computers as powerful as in 1985 or AI controls humans or ?
jrincayc (nerd_gatherer) · 2025-02-03T00:51:05.706Z · comments (0)
Some Theses on Motivational and Directional Feedback
abstractapplic · 2025-02-02T22:50:04.270Z · comments (1)
Humanity Has A Possible 99.98% Chance Of Extinction
st3rlxx · 2025-02-02T21:46:49.620Z · comments (1)
Exploring how OthelloGPT computes its world model
JMaar (jim-maar) · 2025-02-02T21:29:09.433Z · comments (0)
An Introduction to Evidential Decision Theory
Babić · 2025-02-02T21:27:35.684Z · comments (1)
"DL training == human learning" is a bad analogy
kman · 2025-02-02T20:59:21.259Z · comments (0)
Conditional Importance in Toy Models of Superposition
james__p · 2025-02-02T20:35:38.655Z · comments (1)
Tracing Typos in LLMs: My Attempt at Understanding How Models Correct Misspellings
Ivan Dostal (#R@q0YSDZ3ov$f6J) · 2025-02-02T19:56:34.771Z · comments (0)
next page (older posts) →