LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

Cyborg Periods: There will be multiple AI transitions
Jan_Kulveit · 2023-02-22T16:09:04.858Z · comments (9)
The Open Agency Model
Eric Drexler · 2023-02-22T10:35:12.316Z · comments (18)
Intervening in the Residual Stream
MadHatter · 2023-02-22T06:29:37.973Z · comments (1)
What do language models know about fictional characters?
skybrian · 2023-02-22T05:58:43.130Z · comments (0)
Power-Seeking = Minimising free energy
Jonas Hallgren · 2023-02-22T04:28:44.075Z · comments (10)
[link] The shallow reality of 'deep learning theory'
Jesse Hoogland (jhoogland) · 2023-02-22T04:16:11.216Z · comments (11)
Candyland is Terrible
jefftk (jkaufman) · 2023-02-22T01:50:03.375Z · comments (2)
A proof of inner Löb's theorem
James Payor (JamesPayor) · 2023-02-21T21:11:41.183Z · comments (0)
Fighting For Our Lives - What Ordinary People Can Do
TinkerBird · 2023-02-21T20:36:32.579Z · comments (18)
The Emotional Type of a Decision
moridinamael · 2023-02-21T20:35:17.276Z · comments (0)
What is it like doing AI safety work?
KatWoods (ea247) · 2023-02-21T20:12:01.977Z · comments (2)
Pretraining Language Models with Human Preferences
Tomek Korbak (tomek-korbak) · 2023-02-21T17:57:09.774Z · comments (18)
A Stranger Priority? Topics at the Outer Reaches of Effective Altruism (my dissertation)
Joe Carlsmith (joekc) · 2023-02-21T17:26:12.981Z · comments (15)
EIS X: Continual Learning, Modularity, Compression, and Biological Brains
scasper · 2023-02-21T16:59:42.438Z · comments (4)
No Room for Political Philosophy
Arturo Macias (arturo-macias) · 2023-02-21T16:11:38.010Z · comments (7)
Deceptive Alignment is <1% Likely by Default
DavidW (david-wheaton) · 2023-02-21T15:09:27.920Z · comments (26)
AI #1: Sydney and Bing
Zvi · 2023-02-21T14:00:00.480Z · comments (44)
You're not a simulation, 'cause you're hallucinating
Stuart_Armstrong · 2023-02-21T12:12:21.889Z · comments (6)
Basic facts about language models during training
beren · 2023-02-21T11:46:12.256Z · comments (14)
[link] [Preprint] Pretraining Language Models with Human Preferences
Giulio (thesofakillers) · 2023-02-21T11:44:27.423Z · comments (0)
Breaking the Optimizer’s Curse, and Consequences for Existential Risks and Value Learning
Roger Dearnaley · 2023-02-21T09:05:43.010Z · comments (1)
[link] Medlife Crisis: "Why Do People Keep Falling For Things That Don't Work?"
RomanHauksson (r) · 2023-02-21T06:22:23.608Z · comments (5)
A foundation model approach to value inference
sen · 2023-02-21T05:09:29.658Z · comments (0)
Instrumentality makes agents agenty
porby · 2023-02-21T04:28:57.190Z · comments (4)
Gamified narrow reverse imitation learning
TekhneMakre · 2023-02-21T04:26:45.792Z · comments (0)
Feelings are Good, Actually
Gordon Seidoh Worley (gworley) · 2023-02-21T02:38:11.793Z · comments (1)
AI alignment researchers don't (seem to) stack
So8res · 2023-02-21T00:48:25.186Z · comments (40)
EA & LW Forum Weekly Summary (6th - 19th Feb 2023)
Zoe Williams (GreyArea) · 2023-02-21T00:26:33.146Z · comments (0)
What to think when a language model tells you it's sentient
Robbo · 2023-02-21T00:01:54.585Z · comments (6)
On second thought, prompt injections are probably examples of misalignment
lc · 2023-02-20T23:56:33.571Z · comments (5)
Nothing Is Ever Taught Correctly
LVSN · 2023-02-20T22:31:50.917Z · comments (3)
Behavioral and mechanistic definitions (often confuse AI alignment discussions)
LawrenceC (LawChan) · 2023-02-20T21:33:01.499Z · comments (5)
Validator models: A simple approach to detecting goodharting
beren · 2023-02-20T21:32:25.957Z · comments (1)
There are no coherence theorems
Dan H (dan-hendrycks) · 2023-02-20T21:25:48.478Z · comments (115)
[question] Are there any AI safety relevant fully remote roles suitable for someone with 2-3 years of machine learning engineering industry experience?
Malleable_shape · 2023-02-20T19:57:12.955Z · answers+comments (2)
A circuit for Python docstrings in a 4-layer attention-only transformer
StefanHex (Stefan42) · 2023-02-20T19:35:14.027Z · comments (8)
Sydney the Bingenator Can't Think, But It Still Threatens People
Valentin Baltadzhiev (valentin-baltadzhiev) · 2023-02-20T18:37:44.500Z · comments (2)
EIS IX: Interpretability and Adversaries
scasper · 2023-02-20T18:25:43.641Z · comments (7)
What AI companies can do today to help with the most important century
HoldenKarnofsky · 2023-02-20T17:00:10.531Z · comments (3)
[link] Bankless Podcast: 159 - We’re All Gonna Die with Eliezer Yudkowsky
bayesed · 2023-02-20T16:42:07.413Z · comments (54)
[link] Speculative Technologies launch and Ben Reinhardt AMA
jasoncrawford · 2023-02-20T16:33:56.964Z · comments (0)
[link] [MLSN #8] Mechanistic interpretability, using law to inform AI alignment, scaling laws for proxy gaming
Dan H (dan-hendrycks) · 2023-02-20T15:54:13.791Z · comments (0)
Bing finding ways to bypass Microsoft's filters without being asked. Is it reproducible?
Christopher King (christopher-king) · 2023-02-20T15:11:28.538Z · comments (15)
Metaculus Introduces New 'Conditional Pair' Forecast Questions for Making Conditional Predictions
ChristianWilliams · 2023-02-20T13:36:19.649Z · comments (0)
On Investigating Conspiracy Theories
Zvi · 2023-02-20T12:50:00.891Z · comments (38)
The Estimation Game: a monthly Fermi estimation web app
Sage Future (aaron-ho-1) · 2023-02-20T11:33:04.736Z · comments (2)
The idea that ChatGPT is simply “predicting” the next word is, at best, misleading
Bill Benzon (bill-benzon) · 2023-02-20T11:32:06.635Z · comments (87)
Russell Conjugations list & voting thread
Daniel Kokotajlo (daniel-kokotajlo) · 2023-02-20T06:39:44.021Z · comments (62)
Emergent Deception and Emergent Optimization
jsteinhardt · 2023-02-20T02:40:09.912Z · comments (0)
AGI doesn't need understanding, intention, or consciousness in order to kill us, only intelligence
James Blaha (james-blaha) · 2023-02-20T00:55:34.329Z · comments (2)
← previous page (newer posts) · next page (older posts) →