LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

Pick two: concise, comprehensive, or clear rules
Screwtape · 2025-02-03T06:39:05.815Z · comments (21)
[link] OpenAI releases deep research agent
Seth Herd · 2025-02-03T12:48:44.925Z · comments (18)
o3-mini Early Days
Zvi · 2025-02-03T14:20:06.443Z · comments (0)
[link] Meta: Frontier AI Framework
Zach Stein-Perlman · 2025-02-03T22:00:17.103Z · comments (0)
[question] Can we infer the search space of a local optimiser?
Lucius Bushnaq (Lblack) · 2025-02-03T10:17:01.661Z · answers+comments (1)
$300 Fermi Model Competition
ozziegooen · 2025-02-03T19:47:09.270Z · comments (4)
[link] Keeping Capital is the Challenge
LTM · 2025-02-03T02:04:27.142Z · comments (1)
Tear Down the Burren
jefftk (jkaufman) · 2025-02-04T03:40:02.767Z · comments (0)
[link] Constitutional Classifiers: Defending against universal jailbreaks (Anthropic Blog)
Archimedes · 2025-02-04T02:55:44.401Z · comments (0)
Alignment Can Reduce Performance on Simple Ethical Questions
drhens · 2025-02-03T19:35:42.895Z · comments (2)
New Foresight Longevity Bio & Molecular Nano Grants Program
Allison Duettmann (allison-duettmann) · 2025-02-04T00:28:30.147Z · comments (0)
Sleeper agents appear resilient to activation steering
Lucy Wingard (lucy-wingard) · 2025-02-03T19:31:30.702Z · comments (0)
Use computers as powerful as in 1985 or AI controls humans or ?
jrincayc (nerd_gatherer) · 2025-02-03T00:51:05.706Z · comments (0)
[link] Eliezer Yudkowsky on The Trajectory Podcast
Filipe · 2025-02-03T23:44:24.590Z · comments (0)
The Overlap Paradigm: Rethinking Data's Role in Weak-to-Strong Generalization (W2SG)
Serhii Zamrii (aligning_bias) · 2025-02-03T19:31:55.282Z · comments (0)
The Outer Levels
Jerdle (daniel-amdurer) · 2025-02-03T14:30:29.230Z · comments (0)
eliminating bias through language?
KvmanThinking (avery-liu) · 2025-02-04T01:52:01.508Z · comments (0)
Neuron Activations to CLIP Embeddings: Geometry of Linear Combinations in Latent Space
Roman Malov · 2025-02-03T10:30:48.866Z · comments (0)
Visualizing Interpretability
Darold Davis (darold) · 2025-02-03T19:36:38.938Z · comments (0)
Can someone, anyone, make superintelligence a more concrete concept?
Ori Nagel (ori-nagel) · 2025-02-04T02:18:51.718Z · comments (0)
[link] What are the "no free lunch" theorems?
Vishakha (vishakha-agrawal) · 2025-02-04T02:02:18.423Z · comments (1)
Part 1: Enhancing Inner Alignment in CLIP Vision Transformers: Mitigating Reification Bias with SAEs and Grad ECLIP
Gilber A. Corrales (mysticdeepai) · 2025-02-03T19:30:52.505Z · comments (0)
How AGI Defines Its Self
Davey Morse (davey-morse) · 2025-02-03T18:47:22.287Z · comments (0)
[link] Language Models and World Models, a Philosophy
kyjohnso · 2025-02-03T02:55:36.577Z · comments (0)
Stopping unaligned LLMs is easy!
Yair Halberstadt (yair-halberstadt) · 2025-02-03T15:38:27.083Z · comments (9)
Gettier Cases [repost]
Antigone (luke-st-clair) · 2025-02-03T18:12:22.253Z · comments (1)
A "base process" conceptually "below" any "base" universes
Amy Johnson (Amy Minge) · 2025-02-03T19:11:22.706Z · comments (1)
The Self-Reference Trap in Mathematics
Alister Munday (alister-munday) · 2025-02-03T16:12:21.392Z · comments (21)
next page (older posts) →